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Preface 



Nearly everyone of a technical bent who has thought about the problem of tracking for 
graphics for 15 minutes or so believes they have an easy answer— "Why don't you just..." 
This course (new for SIGGRAPH 2001) is designed to take you beyond those first few 
minutes of thought with an "under the hood" look at how these systems work. Our goal is 
to convey a basic understanding of the characteristics of the available technologies, their 
fundamental limitations, in what cases those limitations hurt you, and some things you can 
do to improve your results. 

In putting together this course pack we decided not to simply include copies of the slides 
for the course presentation, but to attempt to put together a small booklet of information 
that could stand by itself. The course slides and other useful information, including an 
electronic bibliographic version of "Tracking Bibliography" on page 95 are available at 

http : / /www . cs . unc . edu/~tracker /ref /s2 00 1 /tracker/ 

We expect that you (the reader) have a basic mathematical background, sufficient to 
understand explanations involving beginning statistics, random signals, and geometric 
transformations. 
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1. Introduction 



One of the important problems in Virtual Environment (VE) research today is that of 
providing a fast, accurate, and unobtrusive method for reliably tracking a computer user's 
real-world position and orientation or pose. Such tracking is necessary in VE systems 
because a user must continually be provided with two-dimensional computer generated 
images that match the user's three-dimensional real-world position and orientation. 
Similarly, human motion capture systems are often used for physiological studies (e.g., 
analysis of athletic motion) or special effects in motion pictures. Usually if the user's 
position and orientation are not tracked accurately or fast enough, disturbing or even 
harmful effects can be observed. 

Systems for tracking and motion capture for interactive computer graphics have been 
explored for over 30 years [Sutherland, 1968 #297]. Throughout the years commercial and 
research teams have explored mechanical, magnetic, acoustic, inertial, and optical 
technologies. Complete historical surveys include [Bhatnagar, 1993 #264; Burdea, 1994 
#267; Mulder, 1994 #286; Mulder, 1998 #288; Meyer, 1991 #432; Meyer, 1992 #285]. 
Commercial magnetic tracking systems for example [Ascension, 2000 #251; Polhemus, 
2000 #294] have enjoyed popularity as a result of a small user-worn component and 
relative ease of use. Optical systems have been developed for 3D motion capture 
[Woltring, 1974 #310; Woltring, 1976 #485; MAC, 2000 #282] and 6D tracking for visual 
simulations via head-worn displays [Wang, 1990 #301; Wang, 1990 #538; Ward, 1992 
#540; Welch, 1999 #307; Welch, 2001 #308]. Recently inertial hybrid systems have been 
gaining popularity because of the reduced high-frequency noise and direct measurements 
of derivatives [Foxlin, 1998 #270; Intersense, 2000 #277] . 

1.1 Course Description 

Every year, dozens of vendors display different systems for motion capture and tracking at 
the SIGGRAPH exhibition, while researchers continue to pursue new approaches in the 
laboratory. Why are there so many different approaches to this seemingly simple problem? 
How do the systems differ? What are the strong and weak points of each? How can you 
decide which is appropriate for your application? 

We will attempt to answer these questions and more in this course on some fundamental 
technologies behind tracking and motion-capture systems. We will use actual systems 
(commercially available) as examples, describing some algorithms used in popular 
magnetic, inertial, and optical tracking systems, relating the pros and cons of the systems 
to the fundamental technologies and the algorithms. While there have been previous 
SIGGRAPH courses on motion capture, they have primarily concentrated on the graphics 
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application. Instead we will take you "under the hood" of several systems so that you 
might better understand the performance (or lack thereof) that you experience with 
different applications, and perhaps improve your results by adjusting your setup to better 
match the technology. 

1.2 Speaker/Author Biographies 

Danette Allen is a Ph.D. Candidate in the Department of Computer Science at the 
University of North Carolina at Chapel Hill. Her primary research interest is tracking 
technologies but her interests also include hardware and software for man-machine 
interaction and virtual environments. Allen graduated from North Carolina State 
University with degrees in Electrical Engineering and Computer Engineering in 1988 and 
1989. She received her Master's Diploma in Business Administration from Manchester 
Business School, U.K. in 1990. In 1997, she received an M.E. in computer engineering 
from Old Dominion University. Allen began working at NASA Langley Research Center 
(LaRC) in Hampton, Virginia in 1991 where she is currently employed. She is a recipient 
of NASA's Silver Snoopy award, the astronauts' award for outstanding performance in 
flight safety and mission success. She is a member of the IEEE Computer Society and the 
Association of Computing Machinery. 

Gary Bishop is an Associate Professor in the Department of Computer Science at the 
University of North Carolina at Chapel Hill. His research interests include hardware and 
software for man-machine interaction, 3D interactive computer graphics, virtual 
environments, tracking technologies, and image-based rendering. Bishop graduated with 
highest honors from the Southern Technical Institute in Marietta, Georgia, with a degree in 
Electrical Engineering Technology in 1976. He completed his Ph.D. in computer science 
at UNC-Chapel Hill in 1984. Afterwards he worked for Bell Laboratories and Sun 
Microsystems before returning to UNC in 1991. 

Greg Welch is a Research Assistant Professor in the Department of Computer Science at 
the University of North Carolina at Chapel Hill. His research interests include hardware 
and software for man-machine interaction, 3D interactive computer graphics, virtual 
environments, tracking technologies, tele-immersion, and projector-based graphics. Welch 
graduated with highest distinction from Purdue University with a degree in Electrical 
Engineering Technology in 1986 and received a Ph.D. in computer science from UNC- 
Chapel Hill in 1996. Before coming to UNC he worked at NASA's Jet Propulsion 
Laboratory and Northrop-Grumman's Defense Systems Division. He is a member of the 
IEEE Computer Society and the Association of Computing Machinery. 
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2. Background 

2.1 Basic Coordinate Transforms 

This section briefly discusses the basic 2D and 3D geometrical transformations used in 
computer graphics and tracking. Most of the information is taken from [Foley, 1997 #562] 
and [Robinett, 1994 #565], the latter of which is included in Appendix C of this course 
pack. 

We assume a right-handed coordinate system as shown in Figure 2.1. 

y 
k 




Figure 2.1: Right-handed coordinate system 



Translation, rotation and scaling are the essential transformations in computer graphics 
and tracking. Translation displaces points by a fixed distance in a given direction. Scaling 
increases or decreases the size of an object. Rotation revolves a point around a specified 
axis. Figure 2.2 illustrates these transformations in 2D space. 



y 



-► x 




X 



V 



Figure 22: Translation, scaling and rotation (around the x-axis) in 2D 
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In a right-handed coordinate system, positive rotations are defined such that a 90° 
counterclockwise rotation transforms one positive axis into another, table 2.1 [Foley, 1997 
#562] follows from this convention. 



Axis of 
Rotation 


Direction of 
Positive 
Rotation 


X 


y to z 


y 


z to X 


z 


x to y 



Table 2.1: Positive Rotations 



2.1.1 Coordinate Systems 

Coordinate systems serve as a 6D basis to which all points, lines, objects, etc. are 
referenced. For example, in describing the human head, we might declare a head 
coordinate system with its origin at the extreme tip of the nose. From this origin, we can 
determine through a series of transformations (translation, rotation and scaling) where the 
eyes, ears, mouth, etc. are located. Beyond this we can think about a world coordinate 
system which references the origin of the head coordinate system in the world. 

Tracker measurements are provided in the tracker coordinate system. Consider an acoustic 
tracker that provides a range measurement d [m] from room-mounted transmitters to a 
sensor mounted on the user. This d , measured from the transmitter to the sensor, is 
referenced to the tracker's coordinate system. What we want is the user's head position in 
room or world coordinates. This is further complicated if we want to know the position of 
the user's eyes. We must transform from the trackers 's coordinate frame to the coordinate 
frame of the user's head. If we are rendering in stereo, from the user's head coordinate 
frame, we transform to the both of the user's eyes. 

2.1.2 Affine Transformations 

With respect to computer graphics, we use the term transform to refer to the mathematical 
operation of modifying a graphics primitive by adding, multiplying, and even dividing its 
numerical elements achieve such effects as translation, rotation, and perspective 
projection. In particular, translation, rotation and scaling are classified as affine 
transformations. They preserve parallelism of lines but not angles and lengths. Another 
less often used affine transformation is a form of distortion known as shearing. 
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2.1.2.1 2D Transformations 



We define a point in 2D space as a column vector, 



p = 



and a transformed point as 



P' = 



We left-multiply a point vector, P , by a transformation matrix, M, to acquire a new point 
vector, P , such that P* = MP . 

We can translate points to new positions by adding distances, d, to the coordinates of the 
original points. We define a translation vector, T, such that 



T = 



yj 



and the transformed point P' = P + T . where 

x 1 = x + d x 
and 

y' = y + d v 



Points can be scaled (stretched or shrunk) in the x and y directions by a scaling matrix, 5, 
such that 



S = 



s 0 



0 5 



yj 



and 



and 



y = y ■ ^ 
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Points can be rotated around the origin by an angle, 0 , with a rotation matrix, /?, where 

^ _ cos0 -sin 6 
sin0 cos0_ 

and 

x' = x • cos0 -y * sin0 
and 

y f = x * sin0 + y • cos0. 

Any and all of the transformations can be applied multiple times and in succession. For 
example, we can translate a point, scale it, rotate it and translate it again with 

P = T2 + (RS-(T1+P)). 

Suppose we have the following object as shown in Figure 2.3 [Foley, 1997 #562] 



y 



► x 



Figure 23: Object to be transformed 



The series of transformations described above might appear as shown in Figure 2.4. 



y 
A 



y 
i 



t a 



Figure 2.4: Series of transformations on an object 



2.122 Homogeneous Coordinates 

Notice that rotation, scaling and shearing are all multiplicative transforms while 
translation is an additive transform. We would like to be able to treat these transformations 
consistently to simplify transformation combinations. Homogeneous coordinates allow us 
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to apply all four transformations multiplicatively. In homogeneous coordinates, we add a 
third coordinate in 2D space, w . Now a single point, P , is represented as a three-element 
column vector where 



p = 



where w * 0 and typically w = 1 , 



When we add an "extra" dimension to the 2D coordinates, every 2D point [jc, y] T in 3D 
space represents a point along a line that passes through [jc, y, w] T where we want the 
specific point [jcj, y v Wj] T where Wj = 1 . If w does not equal 1, we simply divide all 
three elements by w to force w = 1 . 




Figure 2.5: 2D homogeneous coordinate space 



Any two sets of points [jc., y t , w £ ] T and [jc ^ y^ Wj] T represent the same point if one is a 
multiple of the other. Dividing P. and Pj by w i and Wj respectively will result in the 
same coordinates [jc, y, 1 ] T . Points at w = 0 are points at infinity. 

In 2D homogeneous coordinates, we have the following transformation matrices for 
translation, scaling and rotation: 



T = 



R = 



5 = 



I0d 2 
0 1 d } 
0 0 1 



cos0 -sin8 0 
sin 8 cosB 0 
0 0 1 



,,0 0 
0 s y 0 
0 0 1 
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Now we can translate a point, scale it, rotate it and translate it again with 

P = T2 R S TX P. 

2.1.23 3D Transformations 



We extend the idea of homogeneous coordinates to 3D space and represent a point as a 4- 
element vector, P , such that 



p = 



where w * 0 and typically w = 1 . 



and we have the following transformation matrices for translation and scaling: 



T = 



S = 



10 0^ 
0 1 0 d 
0 0 w, 
0 0 0 1 



s x 0 0 0 
0 j 0 0 
0 0^0 
0 0 0 1 



Rotation around a three-coordinate axis is described in the following section. 



2.13 Representing and working with orientation and rotations 



We represent orientation using a rotation from some known orientation just like we 
represent positions as a translation from some known position origin. The important 
difference between orientation and position is that orientation space is wrapped on itself in 
a way that linear position space is not. For example a rotation about the X axis of 45 
degrees will produce the same orientation as a rotation of 405 degrees about the X axis. To 
make matters more confusing there are also combinations of rotations about Y and Z that 
can produce the same final orientation. 

This wrapped nature of orientations is a constant source of difficulty when we attempt to 
implement even simple operations such as linear interpolation and filtering. We must be 
prepared to deal with apparent discontinuities in orientation that are really not 
discontinuities at all but rather differing ways to get to the same place. 
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Rotation Matrix 

The most commonly used representation for rotations is the rotation matrix as described in 
the previous section on 3D transforms. The upper 3 by 3 sub matrix of the 4 by 4 
homogenous rotation transform is a 3D rotation matrix. Its inverse is equal to its transpose 
and its determinant is 1 . We can usefully interpret the columns of a rotation matrix as the 
new coordinate axes projected onto the old (or origin) coordinate axes. 

The rotation matrix is the representation of choice for transforming points because only a 
simple and efficiently implemented matrix multiply is required. On the other hand, they 
are inappropriate for filtering or interpolation. Addition and subtraction of elements will 
almost certainly result in a matrix that is not a proper rotation. 



Euler Angles 

General rotations in 3D can be expressed as three successive rotations about different 
axes. For example, a transformation from reference axes to a new coordinate frame may be 
expressed as follows: 



rotation \p about z axis, R Y = 



rotation 6 about y axis, R 2 - 



rotation (|) about x axis, R^ = 



cosip sinip 0 
-sinip cosip 0 
0 0 1 

cos 6 0 -sin0 

0 1 0 
sin6 0 cos6_ 

10 0 
0 cos (J) sin(j> 
0 -sin(j) cos<|> 



(2.1) 



Finally, the full transformation can be expressed as the product of these three separate 
transformations. 



R — /?^/?2^| 



R = 



costJjcosO sinipcosO -sin 8 

sin^cosap sin6 - cos (|) sin cos(}>cosip + sin^sintpsinO sinc|)cos6 
cos(|)cosijj sin6 + sin (j) sin ip cos <J) sin opsin 6 - sin(J)cosip cos(j>cos6 



(2.2) 



Notice the asymmetry in the above matrix in relationship to the three angles. The order of 
the composition matters because matrix multiplication is not commutative. The 
mathematics is trying to tell us that the axes interact. Unfortunately the three Euler angles 
don't indicate this to us at all. If we attempt to filter or interpolate the three angles 
independently we are ignoring exactly this critical interaction. 
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Yaw, Pitch, and Roll 

People often used the words yaw, pitch, and roll to refer to orientations about a self- 
referenced coordinate frame. (Historically these terms have been used in navigation such 
as on ships and in planes.) Specifically if you are sitting upright, looking straight ahead, 
yaw would refer to rotating your head to the left or right around the axis of your neck and 
spine, pitch would refer to elevating or declining your chin up or down, and roll would 
refer to leaning your head toward one shoulder or the other. In other words if you place a 
right-handed coordinate system at the base of your head such that the Z axis is up and you 
are looking down the Y axis, yaw, pitch, and roll would correspond to rotation about the Z, 
X, and Y axes respectively. 

Quaternions 

Hamilton [Hamilton, 1853 #383] invented quaternions to enable division of vectors. You 
can find excellent introductions to quaternions in a SIGGRAPH paper by Shoemake 
[Shoemake, 1985 #551] and in the book "Quaternions and Rotation Sequences" by 
Kuipers [Kuipers, 1998 #553]. We have also included [Vicci, 2001 #566] in Appendix C. 

A quaternion Q consists of a vector augmented by a real number to make a four-element 
entity. It has a real part Q r and a vector part Q y . If Q r is zero, Q represents an ordinary 
vector; if Q v is zero, it represents an ordinary real number. A unit quaternion has the sum 
of the squares of its four elements equal to 1 . 

Unit quaternions represent rotations. The vector part of the quaternion specifies the axis of 
rotation. The real part of the quaternion is the cosine of half the rotation angle. Thus, a 
quaternion {Q r , Q v } represents a rotation of 2 acos Q r about the axis Q y following the 
right-hand rule. 

From the above, we can see that the identity rotation is represented by the quaternion 
{1,[0, 0, 0]} which specifies a rotation of 0 degrees about an unspecified axis. Here are 
some other simple rotations: 



Quaternion addition is accomplished simply by adding like parts; real part to real part and 
vector part to vector part. 

Multiplication of quaternions P = QR is defined as 




270 degrees about Z 




{P r ,P) = {Q r R r -Q v R v ,Q r R v + R,Q v + Q v ®R v } 



(2.3) 
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This equation says that the real part of the result is the product of the real parts minus the 
inner product of the vector parts. The vector part of the result is the real part of Q times 
the vector part of R , plus the real part of R times the vector part of Q , plus the cross 
product of the vector parts. Quaternion multiplication composes the rotations of the two 
quaternions. 

The inverse of a quaternion has the same real part and the negative of the vector part. 

In order to rotate a point 5 by a quaternion Q we evaluate QSQ 1 where the 
multiplication is quaternion multiplication. In this multiplication, think of the vector S as 
the vector part of a quaternion with zero real part. Though it isn't obvious, this triple 
product will always result in a quaternion with a zero real part. The vector part will be the 
rotated point. 

To convert a quaternion {w, [x, y, z]} to an equivalent rotation matrix we can evaluate: 



R = 



l-2y 2 - 2z 2xy + 2wz 2xz - 2wy 
2xy-2wz \-2x -2z 2yz + 2wx 
2xz + 2wy 2yz - 2wx \-2x - 2y\ 



(2.4) 



Watch out when you are given four numbers said to represent a quaternion rotation; there 
is no agreement on the order of the elements. Some systems specify the real part followed 
by the vector part, others specify the real part last. It is impossible to tell by examining the 
four elements which is the real part unless the rotation is known. 

The unit quaternions can be thought of as points on a 4D sphere. Each point represents a 
rotation. Each point would represent a unique orientation except for the difficulty that 
points connected by a line through the center of the sphere (that is Q and -Q) represent the 
same rotation. In a time sequence of quaternions we sometimes see apparent jumps that 
are actually just a result of this ambiguity. When filtering or interpolating in a sequence it 
is important to handle these apparent discontinuities. 

Interpolations among quaternions are properly carried out with spherical interpolation on 
the 4-sphere [Shoemake, 1985 #551]. Linear operations are equivalent to moving along a 
chord of the sphere rather than on the surface. In a small region, the sphere appears to be 
flat so the locally linear operations are a good approximation. Just be sure to renormalize 
the quaternions that result. Don't attempt linear operations on quaternions that are far 
apart in orientation. Vicci [Vicci, 2001 #566] introduces a new approach to averaging 
rotations and orientations represented as quaternions. 



Small Euler A ngles 

When the angles are very small, the Euler angle rotation matrix above takes on a 
particularly simple and useful form. For small angles (expressed in radians) sin a -> a 
and cos a -* 1 . The sine approximation has relative error of about 0.5% at 10 degrees and 
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the error decreases quadratically as the angle gets smaller. The relative error of the cosine 
approximation is about 3 times that in the sine approximation. The rotation matrix in 
equation (2.2) reduces with the small angle approximation to 



R = 



1 -9 

ip l 4> 
6 i_ 



In this form, the angles behave linearly allowing simple interpolation and filtering. This 
has become our representation of choice. As described in [Welch, 1997 #304], we 
represent orientation with two terms, a global orientation represented as a quaternion and a 
local perturbation of that orientation represented as small Euler angles. All filtering 
operations are done on the small angles making use of their local linearity. The filtered 
angles are used to update the global orientation so that the small angle property is 
preserved. 

Small Euler angles also have a simple relationship to quaternions. For small angles the 
approximate quaternion is: 

Q = {i-4> 2 -e 2 -^ 2 ,[(|),9,^]} 
2.1.4 Coordinate System Transforms 

Let M A B denote a transform (scale, rotation, translation) from coordinate system B to 
coordinate system A. We represent points as column vectors, therefore p A = M A B • p B 
is the transform of the point p B in coordinate system B, by M A B , to poinfp^ in 
coordinate system A. The composition of two transforms is given by 
^ A C = ^ A B * C an( * trans f° rm a point in coordinate system C into coordinate 
system A. Figure 2.6 [Robinett, 1994 #565] shows a diagram of a point p and its 
coordinates in coordinate systems A and B . 




Figure 2.6: Transformation M A B 



We need a sequence of transformations to express the target position in the world 
coordinate system, a Head_World transform. As described in [Robinett, 1994 #565], the 
primary function of the Head_World transform is to contain the measurement made by the 
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tracker of head position and orientation, which is updated each display frame as the user's 
head moves around. The tracker hardware measures the position and orientation of a small 
movable sensor with respect to the fixed tracker coordinate frame. 

The two components of tracker hardware, the tracker's base and the tracker's sensor, have 
native coordinate systems associated with them by the tracker's hardware and software. If 
the tracker base is bolted onto the ceiling of the room, this defines a coordinate system for 
the room with the origin up on the ceiling and with the X, Y, and Z axes pointing 
whichever way it was mechanically convenient to mount the tracker base onto the ceiling. 
In a VR environment, the sensor is mounted somewhere on the rigid structure of a head- 
mounted display, and the HMD inherits the native coordinate system of the sensor. 

So, the Head_World Transform is actually comprised of a series of transforms from World 
to Tracker Base to Head Sensor to Head. The transformation is decomposed into 

M HJN ~ M H_HS ' M HS_TB ' M TB_W 

We can transform to the user's eyes or any other point in the user's coordinate system with 
an additional transformation such as M LE H to transform from the head to the left eye. 

2.2 Probability and Random Variables 

What follows is a very basic introduction to probability and random variables. For more 
extensive coverage see for example [Maybeck, 1979 #93; Brown, 1996 #246; Kailath, 
2000 #279]. 

22.1 Probability and Random Variables 

Most of us have some notion of what is meant by a "random" occurrence, or the 
probability that some event in a sample space will occur. Formally, the probability that the 
outcome of a discrete event (e.g., a coin flip) will favor a particular event is defined as 

p/ A \ _ Possible outcomes favoring event A 
Total number of possible outcomes * 

The probability of an outcome favoring either A or B is given by 

P(AUB) = P(A) + P(B). (2.5) 

If the probability of two outcomes is independent (one does not affect the other) then the 
probability of both occurring is the product of their individual probabilities: 

P(AHB) = P(A)P(B). (2.6) 

For example, if the probability of seeing a "heads" on a coin flip is 1/2, then the 
probability of seeing "heads" on both of two coins flipped at the same time is 1/4. (Clearly 
the outcome of one coin flip does not affect the other.) 
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Finally, the probability of outcome A given an occurrence of outcome B is called the 
conditional probability of A given B , and is defined as 

P(A\B) = Pi p^ ] - (2-7) 

Random Variables 

As opposed to discrete events, in the case of tracking and motion capture we are more 
. typically interested with the randomness associated with a continuous electrical voltage or 
perhaps a user's motion. In each case we can think of the item of interest as a continuous 
random variable, A random variable is essentially a function that maps all points in the 
sample space to real numbers. For example, the continuous random variable X(t) might 
map time to position. At any point in time t (time is the sample space) X(t) would tell us 
the expected position. 

In the case of continuos random variables, the probability of any single discrete event A is 
in fact 0. That is, P( A) = 0 . Instead we can only evaluate the probability of events within 
some interval. A common function representing the probability of random variables is 
defined as the cumulative distribution function: 



This function represents the cumulative probability of the continuous random variable X 
for all (uncountable) events up to and including a . Important properties of the cumulative 
distribution function are 



Even more commonly used than equation (2.8) is its derivative, known as the probability 
density function : 



Following on the above given properties of the cumulative probability function, the 
density function also has the following properties: 



F x (x) = P(-oo,*]. 



(2.8) 



1. F x (x)-^0asjc^-oo 

2. F x (x) -> 1 as x^> +<*> 

3. F x (x) is a non-decreasing function of x. 



(2.9) 



1 . f x (x) is a non-negative function 



00 
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Finally note that the probability over any interval [a, b] is defined as 

P x [a 9 b] = ff x (x)dx. 

So rather than summing the probabilities of discrete events as in equation (2.5), for 
continuous random variables one integrates the probability density function over the 
interval of interest. 

Mean and Variance 

Most of us are familiar with the notion of the average of a sequence of numbers. For some 
N samples of a discrete random variable X , the average or sample mean is given by 

_ x { + x 2 + ... + x N 

X - ~ N ' 

Because in tracking we are dealing with continuous signals (with an uncountable sample 
space) it is useful to think in terms of an infinite number of trials, and correspondingly the 
outcome we would expect to see if we sampled the random variable infinitely, each time 
seeing one of n possible outcomes jCj . . . x n . In this case, the expected value of the discrete 
random variable could be approximated by averaging probability-weighted events: 

_ (p l N)x l + (p 2 N)x 2 + ... + (p n N)x N 

X » . 

N 

In effect, out of N trials, we would expect to see (p^) occurrences of event x x , etc. This 
notion of infinite trials (samples) leads to the conventional definition of expected value for 
discrete random variables 

n 

Expected value of X = E(X) = J Pi x i ( 2 - 10 ) 

i= 1 

for n possible outcomes x^ ...jc n and corresponding probabilities P\--P n • Similarly for 
the continuous random variable the expected value is defined as 

Expected value of X = E(X) = J° xf x {x)dx. (2.11) 

Finally, we note that equation (2.10) and equation (2:11) can be applied to functions of the 
random variable X as follows: 

n 

E(g(X)) = ^ Pi8(* t ) (2-12) 
i = 1 
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and 

E(g(X)) = f g(x)f x (x)dx. (2.13) 

J —00 

The expected value of a random variable is also known as the first statistical moment. We 
can apply the notion of equation (2.12) or (2.13), letting g(X) = X k , to obtain the k ih 
statistical moment. The k ih statistical moment of a continuous random variable X is given 

by 

E(X k ) = f° x k f x {x)dx. (2.14) 

J —OO 

Of particular interest in general, and to us in particular, is the second moment of the 
random variable. The second moment is given by 

E(X 2 ) = £j 2 f x ( x ) dx - (2- 15 > 

When we let g(X) = X-E(X) and apply equation (2.15), we get the variance of the 
signal about the mean. In other words, 

Variance X = E[(X-E(X)) 2 ] 
= E{X 2 )-E(X) 2 . 

Variance is a very useful statistical property for random signals, because if we knew the 
variance of a signal that was otherwise supposed to be "constant" around some value— the 
mean, the magnitude of the variance would give us a sense how much jitter or "noise" is in 
the signal. 

The square root of the variance, known as the standard deviation, is also a useful statistical 
unit of measure because while being always positive, it has (as opposed to the variance) 
the same units as the original signal. The standard deviation is given by 

Standard deviation of X = o x = ^/Variance of X . 
Normal or Gaussian Distribution 

A special probability distribution known as the Normal or Gaussian distribution has 
historically been popular in modeling random systems for a variety of reasons. As it turns 
out, many random processes occurring in nature actually appear to be normally 
distributed, or very close. In fact, under some moderate conditions, it can be proven that a 
sum of random variables with any distribution tends toward a normal distribution. The 
theorem that formally states this property is called the central limit theorem [May beck, 
1979 #93; Brown, 1996 #246]. Finally, the normal distribution has some nice properties 
that make it mathematically tractable and even attractive. 
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The normal distribution is characterized by the following probability density function: 
where the expected value is 

m * = fL xf xW dx 

and the squared-variance is 

o 2 = f (x-m x ) 2 f x (x)dx. 

J —00 

Graphically, the normal distribution is what is likely to be familiar as the "bell-shaped" 
curve shown below in Figure 2.7. 



J 
















a A 



x->-oo 0 m x x -* oo 



Figure 2.7: The Normal or Gaussian probability distribution function. 



Independence and Conditional Probability for Continuous Variables 

As with the discrete case and equations (2.6) and (2.7), independence and conditional 
probability are defined for continuous random variables. Two continuous random variables 
X and Y are said to be statistically independent if their joint probability f X Y^ x -> xs 
equal to the product of their individual probabilities. In other words, they are considered 
independent if 
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In addition, Bayes' rule follows from (2.7), offering a way to specify the probability 
density of the random variable X given (in the presence of) random variable Y . Bayes' 
rule is given as 

_ fy\x(y)fx(x) 
Spatial vs. Spectral Signal Characteristics 

In the previous section we looked only at the spatial characteristics of random signals. As 
stated earlier, the magnitude of the variance of a signal can give us a sense of how much 
jitter or "noise" is in the signal. However a signal's variance says nothing about the 
spacing or the rate of the jitter over time. Here we briefly discuss the temporal and hence 
spectral characteristics of a random signal. Such discussion can be focused in the time or 
the frequency domain. We will look briefly at both. 

A useful time-related characteristic of a random signal is its autocorrelation — its 
correlation with itself over time. Formally the autocorrelation of a random signal X(t) is 
defined as 

R x (t v t 2 ) = E[X(f,)X(f 2 )] (2.16) 

for sample times t l and t 2 I If the process is stationary (the density is invariant with time) 
then equation (2.16) depends only on the difference x = f j - 1 2 . In this common case the 
autocorrelation can be re-written as 

R x (x) = E[X(t)X(t + x)]. (2.17) 

Two hypothetical autocorrelation functions are shown below in Figure 2.7. Notice how 
compared to random signal X 2 , random signal Xj is relatively short and wide. As |x| 
increases (as you move away from x = 0 at the center of the curve) the autocorrelation 
signal for X 2 drops off relatively quickly. This indicates that X 2 is less correlated with 
itself than X x . 

Clearly the autocorrelation is a function of time, which means that it has a spectral 
interpretation in the frequency domain also. Again for a stationary process, there is an 
important temporal-spectral relationship known as the Wiener- Khinchine relation: 

S x (j(o) = $[R x (x)] = f R x (x)e-* m di 

' * J —00 

where indicates the Fourier transform, and co indicates the number of (2k) cycles 
per second. The function S x (jio) is called the power spectral density of the random 
signal. As you can see, this important relationship ties together the time and frequency 
spectrum representations of the same signal. 
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White Noise 



An important case of random signal is the case where the autocorrelation function is a 
dirac delta function 6(x) which has zero value everywhere except when t = 0 . In other 
words, the case where 



Ry(%) = 



pf x = 0 then A 
I else 0 



for some constant magnitude A . In this special case where the autocorrelation is a "spike" 
the Fourier transform results in a constant frequency spectrum, as shown in Figure 2.9. 
This is in fact a description of white noise, which be thought of both as having power at all 



R x (x) 



S x (ju>) 



-r q x 0 co 

Figure 2.9: White noise shown in both the time (left) and frequency domain (right). 



frequencies in the spectrum, and being completely uncorrected with itself at any time 
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except the present (x = 0). This latter interpretation is what leads white noise signals to 
be called independent. Any sample of the signal at one time is completely independent 
(uncorrelated) from a sample at any other time. 

While impossible to achieve or see in practice (no system can exhibit infinite energy 
throughout an infinite spectrum), white noise is an important building block for design and 
analysis. Often random signals can be modeled as filtered or shaped white noise. Literally 
this means that one could filter the output of a (hypothetical) white noise source to achieve 
a non- white or colored noise source that is both band-limited in the frequency domain, and 
more correlated in the time domain. 
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3. Classifications of Devices and Systems 

There are many dimensions to the design space of tracking and motion capture systems, 
and you can consider the design in the context of any one or more of these dimensions. We 
like to think about this classification as "many ways to slice the problem." For example, 
there is the dimensionality of the information provided by the sources and sensors; the 
geometric arrangement of the sources and sensors; whether they offer absolute or relative 
references; passive vs. active; signal to noise ratio; accuracy; resolution; bandwidth; 
latency; update rate; reliability /repeatability; required infrastructure, and on and on. Here 
we look primarily at two classifications: by physical medium and by geometric 
configuration of the devices. We then end this chapter by considering a combination of 
mediums in hybrid systems. 

3.1 By Physical Medium 

Here we describe the most practical (hence common) physical mediums employed in 
tracking and motion capture. We thank Leandra Vicci (UNC-Chapel Hill) for her valuable 
contributions to this section. 

3.1.1 Acoustic Tracking 

Acoustic trackers typically use ultrasonic sound waves to sense range. Ultrasonic waves 
have a frequency above the audible range of the human ear of approximately 20,000 [Hz]. 

3.1.1.1 The Geometry 

A single transmitter/receiver pair provides a distance measurement of the target from a 
fixed point. In the absence of further information, this defines a sphere on whose surface 
the target is located. A shown in Figure 3.1 [Oceanographers, 2001 #561], the addition of 
a second receiver or transmitter restricts this surface to the circle of intersection between 
the two spheres. A third receiver or transmitter restricts this circle to two points, one of 
which can normally be rejected, and determines a 3D position. Therefore, either three 
transmitters and one receiver or three receivers and one transmitter are required to find 3D 
position. To estimate position and orientation, three transmitters and three receivers are 
required. 1 



1 . The reader might be familiar with GPS navigation units in which four satellites are used for 
position estimation. The fourth satellite is used to constrain timing differences in the other three. 
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Figure 3.1: Intersection of two spheres (a circle) and three spheres (two points) 
3.1.1.2 Techniques 

Acoustic trackers typically employ one of two techniques to determine position and 
orientation: 

1 . Time of Flight (TOF), and 

2. Phase Coherence. 

In both methods, the speed of sound is used to convert the time to distance. The drawbacks 
to any time-of-flight or phase difference method is the inherent delay in waiting for the 
signal to travel from the source to the destination and this is exaggerated by the slow speed 
of sound. At 0° C the speed of sounds in air is 331 [m/s], approximately 1 .1 [ft/ms]. To 
complicate matters, the speed of sound varies with temperature and pressure and cannot be 
treated as a constant. More generally, the speed of sound in a gas is 



where y is a thermodynamic constant of air, R is the ideal gas constant, M is the 
molecular weight and T is absolute temperature. 

Time of Flight (TOF) 

The TOF method measures the time t it takes for an ultrasonic pulse to travel from a 
transmitter to a receiver to provide an absolute distance, d . It takes the time required for a 
sound wave to travel form the source to the destination and multiplies that by the speed of 
sound v to get distance. 





An absolute position is estimated from this distance. 
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Phase Coherence 

The phase coherence method measures the phase difference between the sound wave at 
the receiver and the transmitter to provide a change in distance, 5 . All signals can be 
represented as a sum of sinusoidal functions of the form Acos((of-<|)) where <j) is the 
phase shift of the signal and A is its amplitude. From this phase difference between the 
emitted and received signals of a known wavelength or frequency, distance can be 
computed. 

Figure 3.2 below depicts two cosine waves with amplitude of 1 and frequency of 1 Hz. 
The solid curve is cos(jc) with no phase shift, and the dashed curve is cos(x- jt/4) 
where jc/4 is the phase shift. A phase angle of 360° or 2k [radians] is equal to one cycle. 




radians 



Figure 3.2: Cosine functions with phases of 0 and jt/4 [radians] 



The frequency of an acoustic wave, / [Hz], where [Hz] is [1/s], is related to its speed, 
c [m/s] , and wavelength, X [m] , by the equation c = Xf where c varies with temperature 
and pressure as described previously. The fraction 6 of the wavelength X corresponding 
to the phase shift can be computed by 

xr i ir i ^W radians] 
6[m] = X[m]- 24radians] 

= c[m/s] . *W radians] 
/[Hz] 2jt[radians] 
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If the speed of sound is 331 [m/s] and, as in the graph above, the transmitted signal had 
zero phase and the received signal had a phase of Jt/4 [radians], we calculate a phase 
difference of +7t/4 [radians] and know that the signal traveled 

_ c[m/s] <W radians] 
L J /[Hz] 2jt[radians] 

= 33t[m/s] . i [radianS] 
l[Hz] 2rt[radians] 

= 331[m]-| 

= 41.375[m]. 

If we assume an ultrasonic frequency commonly used in acoustic tracking of 40 [kHz] 
with the same measured phase shift (1/8 of a cycle), we find 

c[m/s] <bdelayl Tadi!inS ] 

LmJ /[Hz] 2n[radians] 

_ 331[m/s] , 1 
40[kHz] 8 

= 1.034[mm]. 

Notice that a phase of <j> + (n ■ 2jc) looks just like a phase of <j> at the receiver, resulting in 
an ambiguity in distance. This is usually resolved by assuming phase changes are small 
between measurement updates. For example, given an acoustic frequency of 40 [kHz], we 
calculate a wavelength of 8.275 [mm] and measure a phase difference of 0.125 as above. 
Without previous position information, we cannot determine whether the target is at a 
distance of 0.125 wavelengths, 17.125 wavelengths or n + 0.125 wavelengths. However, 
if we know the current position estimate is at 100 wavelengths, 0.8275 [m], we assume 
that the new position is at the closest wavelength count to the current count, 100.125 
wavelengths. Therefore the target has moved 8 = 1.034[mm] and the new position 
estimate of the target is at 

d' = d + b 

= 0.8275[m] + 1.034 x 10 _3 [m] • 
= 0.82753[m] 

or, simply 

d' = 100.125[wavelengths] • 8.275T - 1 

LwavelengthJ 

= 828.53[mm] 
= 0.82753[m]. 
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Note that acoustic energy diminishes with the square of the distance between the 
transmitter and receiver. 

3.1.13 Commercial and Research Products 

Commercial products that employ acoustic sensors include Infusion Systems' FarReach, 
Intersense's IS-600 Mark 2 and Mark 2 PLUS (inertial hybrid). 




Figure 33: Intersense's IS-600 Mark 2 



Other acoustic trackers include M.I.T.'s Lincoln Wand (1966). See also [Mulder, 1994 
#287] for more examples. 

3.1.2 Inertial Tracking 

Inertial trackers use accelerometers to measure the acceleration for object position and 
gyros to measure the orientation of object orientation. They are passive, relying on 
Newton's second law of motion, F = ma, and its rotational equivalent M = la , which 
means there are no physical limits on the working volume and the user is able to move 
around unencumbered in the environment. Ideally, both are deployed in orthogonal triples 
(for 3D position in x, y and z and 3D orientation in roll, pitch, and yaw) in order to 
estimate 6D pose. 

3.12.1 Accelerometers 

Accelerometers actually measure the force exerted on mass since acceleration cannot be 
measured directly. This measured force, F , for a given mass, m , is transformed into a 
measure of acceleration, a , by the relationship F = ma . The primary transducer of an 
accelerometer converts acceleration into displacement. Since 

2 

d r 
dt 
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position r is calculated as 

r=tfadt 2 . 

Accelerometers use a known mass (sometimes called the proof-mass) attached to one end 
of a damped spring. The other end of the spring is attached to the accelerometer housing. 
When there is no acceleration imposed upon the accelerometer, the spring is at rest and 
exhibits zero displacement. 
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Figure 3.4: Spring at rest with zero displacement. 

If a force is applied to the housing, the housing will accelerate but inertia causes the 
suspended mass to lag behind, resulting in a displacement. 
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Figure 3.5: Spring under acceleration with displacement. 

The displacement of the mass and extension/compression of the spring is proportional to 
the acceleration of the housing or, in the case or head tracking, the acceleration of the 
wearer. 

A secondary transducer converts this displacement into a usable signal. Two common such 
transducer types are potentiometric and piezoelectric. Potentiometric devices attach the 
displacement of the mass to the slider of a potentiometer. The output voltage of a 
potentiometer is linearly proportional to the slider position. Voltage varies directly with 
current and the electrical resistance supplied by the potentiometer by V = 1R where V is 
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voltage, / is current and R is resistance. Piezoelectric devices attach the displacement to a 
piezoelectric element (a piezoelectric crystal that produces an electric charge when a force 
is exerted upon it) that generates a voltage proportional to the displacement. 

3.1.2.2 Gyroscopes 

Gyroscopes employ the principle of conservation of angular momentum. If torque is 
exerted on a spinning mass, its axis of rotation will precess at right angles to both itself 
and the axis of exerted torque. If the mass spins very fast, it will have a large angular 
momentum that will strongly resist changes in direction. 

Figure 3.6 illustrates the phenomenon of precession. If we fix the axis of rotation on one 
end of the gyroscope allowing it to pivot around this point, the force of gravity simply 
causes the gyroscope to fall down in the direction of the gravity vector if it is not spinning. 
However if the gyroscope is spinning, gravity exerts a torque on the gyroscope about an 
orthogonal axis to the axis of rotation. If the gyroscope is spinning at a sufficient rate, the 
gyroscope does not fall in the direction of gravity. Instead it rotates around the axis that is 
orthogonal to both the gyroscope's axis of rotation and gravity's torque axis. 



Non-Spinning Mass Spinning Mass 




g ravit y precession 

Figure 3.6: Precession 

If the spinning mass is mounted on gimbals, these principles can be used to measure 
changes in direction. The angle that the gyroscope makes with it's housing (gimbal 
deflection) is a measure of the angular momentum or angular velocity. If the gimbals are 
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constrained with springs, the rate of change of direction can be measured. This 
configuration is known as a rate gyro. Examples of mechanical rate gyroscopes are shown 
in Figure 3.7 [Britannica, 1994 #559]. 




Figure 3 J: Rate Gyroscopes for measuring rate of turn (left) and rate of roll (right) 

For 3D orientation (roll, pitch and yaw), three rate gyroscopes are typically fitted to a 
platform with their axes mutually perpendicular. Two of the gyroscopes provide for 
horizontal stabilization of the platform— an essential requirement to eliminate the influence 
of accelerations due to gravity— while the third is responsible for the north-south 
alignment. Pitch, roll, and yaw are detected by the three gyroscope input axes. The gimbal 
deflection of each of the gyroscopes is converted into a signal. 

The physics supporting inertial measuring devises are most easily illustrated using 
mechanical devices. While extremely accurate, these devices are not used for human 
tracking because of their size and mass. Instead micromechanical devices are often 
employed. An example of is BEI Systron Donner Inertial Division's GyroChip that uses a 
vibrating piezoelectric quartz tuning fork to sense rate. 
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Figure 3.8 illustrates the micromechanical gyro designed as an electronically-driven 
resonator. When an angular rate is applied to its drive tines, Coriolis or torsional forces are 
exerted on its drive tines which cause its vibrations to couple to the pickup tines. The 
pickup fork vibrations are detected and used to measure the angular rate. 




Figure 3.8: BEI Systron Donner Inertial Division's GyroChip technology 

Systron Donner [Donner, 2001 #560] offers the following explanation. 

The piezoelectric drive tines are driven by an oscillator to vibrate at a precise 
amplitude, causing the tines to move toward and away from one another at a high 
frequency. This vibration causes the drive fork to become sensitive to angular rate 
about an axis parallel to its tines, defining the true input axis of the sensor. 

Vibration of the drive tines causes them to act like the arms of a spinning ice skater, 
where moving them in causes the skater's spin rate to increase, and moving them out 
causes a decrease in rate. For vibrating tines ("arms"), an applied rotation rate causes a 
sine wave of torque to be produced, resulting from "Coriolis Acceleration," in turn 
causing the tines of the Pickup Fork to move up and down (not toward and away from 
one another) out of the plane of the fork assembly. 

The pickup tines thus respond to the oscillating torque by moving in and out of plane, 
causing electrical output signals to be produced by the Pickup Amplifier. Those signals 
are amplified and converted into a DC signal proportional to rate by use of a 
synchronous switch (demodulator) which responds only to the desired rate signals. 

The DC output signal of the GyroChip is directly proportional to input rate, reversing 
sign as the input rate reverses, since the oscillating torque produced by Coriolis 
reverses phase when the input rate reverses. 
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Regardless of the technology, both accelerometers and gyros provide derivative 
measurements. Linear accelerations must be integrated twice and angular rates need to be 
integrated once to derive position and orientation, respectively. This integration causes 
these inertial measurements to be sensitive to drift. Position error diverges with time as 
errors accumulate. This drift can be combated with periodic recalibration by the use of 
another tracking method that corrects the accumulated error periodically. Conversely 
inertial tracking performance is very good at high frequency and over short time intervals. 

Models useful in explaining the frequency characteristics of an inertial system are shown 
in Figure 3.9. The dashed box marked "User" contains a question mark to indicate that the 
user's acceleration (motion) is unknown. The dashed box marked "Accelerometers" shows 
acceleration measurement noise E a ^ being summed with the ideal user acceleration 
signal, and the sum then being integrated twice to obtain a position estimate. The lower 
part of the figure shows the corresponding transfer function coefficients for spectral 
(frequency) analysis. 
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Figure 3.9: Integration in time domain and division 
in frequency domain of inertial measurements. 



The user's true acceleration is their position twice differentiated, i.e. the user's 
acceleration is weighted by the square of the frequency (s) of their motion. This 
acceleration signal is then weighted by the inverse square of the frequency (s) as it is 
integrated twice in the inertial system to obtain a position estimate. The end result is a 
unity frequency weighting of the position in the final estimate. 
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From Figure 3.9 we also see that any electrical noise e a ^ incurred during the 
measurement of the accelerometer output is weighted solely by tne inverse square of the 
frequency (s) as it is integrated twice in the inertial system. The end result is an inverse 
square frequency weighting of electrical measurement noise in the final position estimate. 

100 



10 = 

Magnitude t 
(LoglO) ] 

0.1 



Frequency 

Figure 3.10: Logarithmic plots of typical accelerometer-based 
position signal vs. noise (for constant velocity). 

In Figure 3.10, the estimated position signal, noise, and signal-to-noise ratio for an inertial 
system using accelerometers are plotted together against frequency. This figure 
demonstrates why the practical application of a solely inertial-based tracking system is 
impractical. At low frequencies the position estimate noticeably diverges as measurement 
noise is erroneously interpreted as acceleration. The most common sources of low 
frequency noise are the unavoidable and often time-dependent random "DC" (or very low 
frequency) biases. Such bias errors can cause an inertial-based tracker to report that a 
subject is moving even when that subject is completely still, or conversely to report that a 
subject is still when in fact they are slowly moving. The result is that in general for inertial 
devices to be practical they must be combined with some other mediums as described in 
Section 3.3. 

An additional source of error is misalignment with the gravity vector whose effects must 
be subtracted out of inertial measurements. The effect of gravity can be significant. One 
degree of tilt error over ten seconds can cause nine meters of position error. 

Noise and quantization error in the signals from the inertial sensors is another important 
source of error. Figure 3.11 shows how error accumulates in inertial systems. The curves 
are contours of time to 0.1 [m] of accumulated error for accelerometer (vertical axis) and 
rate gyro (horizontal axis) signals with the indicated number of useful bits. For example, 
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with 15 bits of useful dynamic range on the rate gyro signal and about 12 bits of useful 
range on the accelerometer signal, the graph predicts about 20 seconds of operation before 
10 [cm] of error accumulates. Increasing the range of the accelerometers will not improve 
the system performance because the rate gyros are the limiting factor. The shape of the 
curves can be explained by realizing that rate gyro errors will result in misestimation of 
the system tilt. As explained earlier, tilt errors cause fractions of the gravity vector to be 
integrated as actual acceleration resulting in potentially large position errors. Region 1 
includes inertial units readily available today. Region 2 includes today's peak 
performance. Region 3 reflects arguably unachievable performance. 



Effect of Se nso r No ise o n I nertial-On ty Perfo mnance (time to0.1 mefers error) 




Rate gyro dynamic range (5rad/s divided by noise) bits 

Figure 3.11: Time to 0.1 [m] error 
3.1,23 Commercial and Research Products 

Inertial trackers have become more common over the last few years due to IC technologies 
allowing for significant reduction in size. Because they require some form of periodic 
calibration to control drift they are typically used in hybrid tracking products. Commercial 
products that employ inertial sensors Ascension's 3D-BIRD, Intersense's IS-300/600 and 
InterTrax 2. See also [Mulder, 1994 #287] for more examples. 
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Ascension's 3D-BIRD 




Intersense's IS-300/600 
Intersense's InterTrax 

Figure 3.12: Some example inertial tracking systems. 



3.13 Magnetic Tracking 

Magnetic trackers use magnetic fields to measure range and orientation. These magnetic 
fields can be low frequency AC fields or pulsed DC fields and three orthogonal triaxial 
coils are used at both the transmitter and receiver to produce position and orientation 
measurements. 

3.13.1 Magnetic Fields 

Generating magnetic fields 

Current carrying coils are used to generate the source magnetic fields. The magnetic field 
produced by a circular coil of wire carrying a current, / , at a distance d and off-axis angle, 
6 , is described by 

M 

H r - -cos (8) (radial component) 

2nd 5 

M 

//x = -sin (6) (tangential component in 6 direction) 

//^ = 0 (tangential component in § direction) 
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where H r and H Q are the radial and tangential components of the field, M is the 
magnetic moment of the loop (M = NIA) , A and N are the area enclosed by the current 
loop and number of the turns of the loop or winding [Raab, 1979 #174] and / is the 
current. Figure 3.13 illustrates the magnetic field of a single-turn winding. 




Figure 3.13: Magnetic Dipole 



Detecting magnetic fields 

A time varying magnetic field will induce a voltage in a coil that can be measured 
electrically. The magnitude of the voltage is proportional to the area circumscribed by the 
coil, the rate of change of the field, and varies as cos 6 where 0 is the angle between the 
direction of the field lines and the axis of the coil. 

3.13.2 System Configuration 

A magnetic tracking system consists of a transmitter and a receiver in the form of coils. A 
ID sensor for estimating the position in z (i.e. direction of gravity) is made up of a single 
coil transmitter oriented in the z-direction. When current is applied to the coil a magnetic 
field is generated. At the receiver, this induces a maximum voltage proportional to the 
sensed magnetic field strength in a receiving coil oriented in the same direction as the 
field. 

The induced voltage level provides information about the both distance from the 
transmitter to the receiver and the axis-alignment between them. Boundaries of equal 
accuracy are found along a hemisphere or sphere around the transmitter. In Figure 3.14 
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[Burdea, 1994 #267] the accuracy and amplitude A2 is less than A \ where the radius R2 
is greater than Rl . Here the radius (distance form the transmitter) is the determining 
factor. 




Figure 3.14: Accuracy contours around a coil 

Three separate coils wound orthogonally around a core are used to generate and measure 
the magnetic field strength in x, y and z. When three orthogonal coils are used, the three 
source coils are activated serially and the induced signal in each of the three receiving 
coils is measured. A full measurement cycle contains three measured values for each of. 
the three source coils and this nine-element measurement is used to calculate the position 
and orientation of the receive coils relative to the source coils. The signal strength per 
receive coil decreases cubically with distance and with the cosine of the angle between its 
axis and the local magnetic field direction. The strength of the induced signals can be 
compared to the known strength of the transmitted signals to find distance. The strength of 
the induced signals are compared to each other to find orientation. 

One disadvantage of AC magnetic sensors is that ferromagnetic and other conducting 
objects within the sensor space can distort the magnetic field geometry. An eddy current is 
induced in conducting materials by the source magnetic field (and other fields such as the 
Earth's magnetic field) and these currents produce small magnetic fields around the 
conducting materials. The fields cause distortions in the source fields shown above 
resulting in erroneous pose estimates. This is particularly a problem when using AC 
transmitters because of the continuously varying nature of AC signals. 

The use of DC transmitters overcomes the eddy current interference problem. Eddy 
currents are generated only at the beginning of a measurement cycle and a steady state can 
be reached where the effect of interfering magnetic fields is minimized as the eddy current 
values approach zero. However, distortions due to ferromagnetism, mainly in steel or iron 
objects, are still a problem for DC systems. 
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3.133 Research and Commercial Products 

Despite some shortcomings, magnetic tracking systems have historically enjoyed 
popularity as a result of a small user-worn component and relative ease of use. 
Commercial products that employ magnetic sensors include Polheumus' 3SPACE 
FASTRAK and ISOTRAK II (AC electromagnetic) and Ascension's Flock of Birds and 
PcBIRD (Pulsed-DC electromagnetic). See also [Mulder, 1994 #287] for more examples. 




Polheumeus' Long Ranger Tracker 



Polheumeus' Stylus Option 




Ascension's Flock of Birds and PcBIRD transmitter 



: m 



Polheumeus' Star*Trak 
Figure 3.15: Examples of magnetic tracking systems. 



3.1.4 Mechanical Tracking 

Mechanical trackers measure joint angles and lengths between joints. Given one known 
position, all other absolute positions can be derived from the relative joint measurements. 
They are used to measure all parts of the body and have been historically employed in 
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motion capture. In implementation, they range from whole body suits that measure the 
position of all the major joints to mechanical arms to gloves that measure the location of 
the hands and fingers. 

Mechanical tracking systems can be ground-based in which one point of the tracker is 
affixed to the floor at a known location. This limits the user's range of motion. They can 
also be body-based in which the system is attached only to the user, typically in the form 
of an exoskeleton. This does not limit physical range of motion but can be prohibitive if 
the suit is heavy or bulky. 

The rotations and lengths can be measured by gears, potentiometers, and bend sensors as 
shown in Figure 3.16. 




Figure 3.16: Mechanical tracking sensors 



A potentiometer is a device that transduces a rotation or displacement to a voltage. A bend 
sensor is typically a thin strip of plastic whose resistance changes as it bends. The more it 
bends, the higher the resistance. Alternatively, optical fiber can be treated for a short 
distance on one side to lose light proportional to the angle through which it is bent. 

An advantage of mechanical trackers is the elegant addition of force feedback as 
illustrated in Sensable's Phantom and the EXOS dexterous hand master pictured in 
Figure 3.17. 




Figure 3.17: Sensable's Phantom and EXOS dexterous hand 
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They typically provide good accuracy and low latency but can be cumbersome for the 
user. The user may be constrained by the suit or arm and, therefore, may not have full 
freedom of movement. 

3.1.4.1 Research and Commercial Products 

Commercial products that employ mechanical sensors are Fakespace's Boom HF, Virtual 
Technologies' CyberGlove and MetaMotion's Gypsy (motion capture only). See also 
[Mulder, 1994 #287] for more examples. 




Figure 3.18: Some example mechanical tracking systems. 
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3.15 Optical Tracking 

Optical trackers use light to measure angles. As shown in Figure 3.19, a single point on the 
detector, an optical sensor or image plane, provides a ray defined by that pixel and the 
center of projection, C. As is the case with the acoustic medium, optical energy 
diminishes with the square of the distance between the transmitter and receiver. 




Figure 3.19: Centroid of light. 



3.1.6 Targets 

Optical tracking systems (also called image-based systems) can use two types of targets: 
active targets and passive targets. Active targets are powered such as an light-emitting 
diode (ILED). Infrared LEDs (ILEDs) are used to combat noise due to ambient light. 
Passive targets are not powered such as reflective materials or high contrast patterns. 
Regardless of categorization, a detector is used to record the object being tracked (the 
target) and from this angle measurement, position and orientation can be derived. Some 
systems use no artificial targets and, instead, use elements of the natural scene. 

3.1.7 Detectors 

Detectors can be simple video and CCD cameras (typically used with passive targets), or 
lateral-effect photodiodes (typically used with active targets) that provide the location of 
the centroid of light on the image plane as illustrated in Figure 3.19. Video cameras and 
CCDs require imaging techniques to determine position while photodiodes produce 
currents that are directly proportional to the light center's position. 

3.1.7.1 Lateral Effect PhotoDiodes (LEPDs) 

ID LEPDs place two terminals on either side of a silicon photosensitive region. An 
incident light beam produces electrons that flow laterally towards the terminals on either 
side of the region. The amount of current measured at each terminal is dependent on the 
distance of the centroid of the incident beam from the terminals. If the centroid occurs at 
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the center of the region, equal current values will be measured at each terminal. A 2D 
LEDP sensor is made up of 2 ID sensors rotated 90 degrees from each other. LEPDs can 
be used to detect both the intensity and position of an incident light beam. 

3.1.7.2 Quad Cells 

Quad cells are made up of four photosensitive cells. When a light beam is incident upon a 
quad cell, a current is generated in each of the four quadrants proportional to the amount 
of light seen by each. A perfectly circular beam illuminating the exact center of the quad 
cell will produce equal photocurrents in the four quadrants. As shown in Figure 3.20, if the 
beam is too small such that it falls in between quad cell or too large such that is covers all 
four cells, there is no information to be gained from the voltage measurements of each 
cell. 




Figure 3.20: A Quad Cell 



The x and y displacements of the beam relative to the center of the quad cell can be 
calculated using the following formulas: 

0*1 +« 2 )-( f 3 + I 4) 

X = ; : ; ; 

l \ + l 2 + l 3 + l 4 

+*' 4 )-('2 + **3) 
y ~ / 1 +/ 2 + / 3 + / 4 



3.1 .73 Charge Coupled Devices (CCDs) 

A CCD array can be a ID or 2D collection of light-sensitive cells. 



^cell/pixel 



Figure 321: ID and 2D CCD Detector Arrays 
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When light is incident upon a CCD cell, electrons are produced and each cell accumulates 
electrons proportional to the amount of incident light for the duration of the dwell time. 
The electron count per array cell is read out as a voltage to provide a per pixel luminance 
value. An "empty" cell corresponds to zero volts (black). This collection of pixel values 
produces a digital image that is similar to the analog images captured with film. The array 
of luminance values can be analyzed to pinpoint the cell (or pixel) of highest intensity and 
sub-pixel accuracy can be achieved by interpolation between discrete pixel values. The 
dwell time dictates how long the CCD will accumulate a charge and must be set large 
enough so that sufficiently high SNR is achieved for pinpointing the target. However, care 
should be taken in fixing the dwell time because it affects the update rate of the CCD. 
Long dwell times delay measurements which affect the rate at which position estimates 
can be determined. 

While optical sensors fundamentally provide angle measurements, they can also be used 
to determine range. The amount of defocus (i.e. blur) due to the limited depth of field of 
lenses can provide information about distance. Structures which are closer to the plane of 
focus for an image will appear sharper. Conversely, structures farther from the plane of 
focus will appear more blurred. As shown in Figure 3.22 (from [Goshtasby, 2001 #567; 
Wood, 2000 #568]) there is ambiguity between two points that lie equidistantly on either 
side of the plane of focus. This ambiguity can be eliminated with the addition of a second 
image plane. 




Figure 3.22: Two objects produce the same position and blur on the image 



In general, optical tracking systems exhibit high accuracy and resolution and are well 
suited to real-time systems in terms of update rate (light is a fast medium). However, there 
is much variety among optical tracking systems due to technical specification difference 
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(focal length, FOV, etc.), system configuration (outside-in, inside-out) and target type. 
This accuracy depends on a clear line of sight between the sensor and the target. If 
something in the environment or the object itself blocks this line-of-sight, optical tracking 
systems suffer from obscuration difficulties. 

In Section 4.1 we describe some traditional approaches to using optical sensors for pose 
estimation. In Appendix C we have also included a copy of [Welch, 2001 #308], which 
describes the UNC-Chapel Hill HiBall optical tracking system. 

3.1.7.4 Commercial and Research Products 

Commercial products that employ optical sensors include InMotion's CODA, 3rdTech's 
HiBall-3000, University of Iowa's Selspot II, Arcsecond's Vulcan, Ascension's laserBIRD, 
and Phoenix Technologies' Visualeyez (motion capture only). Other optical trackers 
include Omniplanar's Virtual Tracker and the Minnesota scanner. See also [Mulder, 1994 
#287] for more examples. 

3.2 Sensor Configurations 

Choosing the physical medium and sensors for a tracking system determines only a part of 
its capability. The geometric configuration of the sources and sensors also has a profound 
effect. For example, in an optical tracker we can have fixed sensors observing moving 
targets or moving sensors observing fixed targets. This has lead to the descriptions 
"outside looking in" and "inside looking out" as described in [Welch, 2001 #308], but the 
direction of "looking" is not the determining factor. Rather, as we shall see, the 
distinguishing factor is the coordinate frame in which the measurements are made. 

3.2.1 Measurements in the Laboratory Frame 

The CODA system [BL, 2000 #266] is a commercial example of a system that makes its 
measurements in the laboratory coordinate frame as illustrated in Figure 3.24. It uses three 
or more stationary ID optical sensors that observe LED beacons that are free to move (a 
typical "outside looking in" configuration). Each of the ID sensors narrows the possible 
location of the LED to a plane in three space. The three planes intersect in a mathematical 
point which is the system's estimate of the three-dimensional coordinates of the LED. The 
FlashPoint 5000 [IGT, 2000 #276] from Image Guided Technologies uses the same 
fundamental measurement strategy though the sensors and optics are quite different. The 
Selspot and OPTOTRAK [NDI, 2001 #546] systems are also essentially the same though 
they use at least two 2D sensors (each determining a line) rather than at least three ID 
sensors. 

The Minnesota Scanner [Sorensen, 1989 #467] uses spinning mirrors at fixed locations in 
the laboratory to project planes of light into the working volume. The tracked target is a 
photo-detector that detects the time at which it is illuminated by the swept plane of light. 
The time is used along with the precisely known rotation rate of the mirror to determine 
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InMotion Systems' CODA 




Phoenix's Visualeyez Arcsecond's Vulcan 



Figure 323: Some example optical tracking systems. 

the equation of a plane that passes through the detector and the center of rotation of the 
mirror. Just as in the CODA system, the multiple planes are then intersected to produce an 
estimate of the 3D coordinate of the target. The commercially available ArcSecond system 
works in the same way. 

Even though the sensors and sources have apparently swapped places in the CODA mxp30 
and the Minnesota Scanner, they are both fundamentally making an angle measurement in 
the laboratory coordinate system. Thus, while the moving sensor on the Minnesota 
Scanner is "looking out", it shares all the characteristics of a "outside-Iooking-in" system. 
What matters is the coordinate frame in which the measurements are made. 
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target 

\ / 




sensor 



Figure 324: Two sensors are shown fixed in 
lab space. They observe a moving target. 
Here in 'Flatland' the position of the image 
on the ID image-line of the sensor 
determines the equation of a line that passes 
through the target. The intersection of two of 
these lines determines the 2D position. 



sensor 



The positional sensitivity of the systems above is determined by the angular resolution of 
the optical sensors, the distance between target and sensor, and on the geometric 
configuration of the sensors. The angular resolution is determined by the field of view of 
the sensor and the resolution of measurements on the image plane. The designer of such a 
system always has a trade off between positional accuracy and working volume; higher 
accuracy requires a smaller working volume while larger working volume implies lower 
accuracy. 

As shown in Figure 3.25 below, when the geometric configuration of the system is such 
that the sensors are physically close together compared to the distance to the target, the 
planes (or lines) they determine are nearly parallel. This results in near singularity of the 
resulting equations and poor positional accuracy in the direction aligned with the planes. 
This problem is aggravated by uncertainty in the sensor estimates. Rather than being 
planes or lines, the constraint provided by a single sensor is better modeled by a cone or a 
wedge. For maximum positional accuracy the sensors should be far apart and at nearly 
right angles to one another. Unfortunately placing the sensors far apart may give rise to 
line-of-sight problems because all sensors must have a clear view of each target. 

In order to determine orientation, multiple targets must be arranged in a rigid 
configuration. Then the relative positions of the targets can be used to derive orientation. 
The sensitivity of the resulting system to small rotations is determined both by its 
positional sensitivity and the distance between targets. Orientation sensitivity is 
maximized by increasing the distance between targets but this can quickly result in a 
physically unwieldy device. 
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Figure 325: When sensors are close together as on the left, the equations of the lines 
they determine are nearly parallel resulting in less positional sensitivity in the direction 
parallel to the lines When the sensors are far apart and at nearly right angles the 
sensitivity is greatest. 



3.2.2 Measurements in the User's Frame 

The HiBall Tracker [Welch, 1999 #307; Welch, 2001 #308] uses a golf-ball sized cluster 
of six 2D optical sensors looking out at LED beacons fixed in the environment. In the 
HiBall, angular measurements are made in the moving coordinate system of the user as 
shown in Figure 3.26, rather than in the fixed coordinate system of the lab. 




Figure 3.26: A cluster of six sensors 
observes targets that are at fixed locations in 
'Flatland'. Each observation determines a 
constraint between the position of the moving 
cluster and its orientation. A minimum of 
three observations are required to determine 
the position and orientation but to ensure 
uniqueness of the solution and to allow for 
measurement errors more are better. 



Unlike the systems above for which one sighting determines a mathematically simple 
constraint such as a plane or line equation, sighting one target with one of the cameras in 
the cluster provides a difficult to visualize constraint on the relationship between the 
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cluster's position and its orientation. Given as few as three sightings it is possible to solve 
a nonlinear system of equations to determine the position and orientation of the cluster 
[Azuma, 1991 #260], With only three sightings there are up to four solutions that can be 
arbitrarily close together [Fischler, 1981 #547]. Larger numbers of sightings provide a 
unique solution and also allow the use of least-squares estimation to reduce the effects of 
noise on the measurements. 

The HiBall does not solve such nonlinear equations to determine it position. Instead it uses 
the SCAAT algorithm described in Section 4.2.5. 

The orientation sensitivity of the HiBall is determined by the angular sensitivity of the 
optical sensors in the cluster. The HiBall achieves high angular sensitivity using very 
narrow fields of view (approximately 6 degrees) and high-resolution analog sensors 
(lateral-effect photo diodes [Wallmark, 1957 #548] with approximately 1 part in 2000 
resolution). This requires that the room-mounted LEDs are densely packed to ensure that 
sufficient numbers are continuously visible. 

The positional sensitivity of the HiBall is similar to the systems described above. It varies 
with distance to the target and with sensor resolution in the same way. 

Comparing systems that measure in the laboratory frame with those that measure in the 
user's frame we see that user-centered measurements can provide higher orientation 
accuracy and comparable positional accuracy for systems of practical size. Systems with 
moving targets have the advantage that the targets are smaller and lighter and thus easier to 
attach to the user. Lab-based sensors are probably preferred when position is the only or 
most important measurement required. 

33 Hybrid Systems 

As described in Section 3.1, every type of sensor has fundamental limitations related to 
the associated physical medium. In addition there are practical limitations imposed by the 
measurement systems, and application-specific limitations related to the motion 
characteristics of the target being tracked. These limitations continuously affect the 
quantity and quality of the information. The result is that no single medium or sensor type 
provides the necessary performance over the wide spectrum of temporal and spatial 
characteristics desired for many applications. Happily several mediums exhibit 
complementary behavior, and these systems can be combined to leverage the strengths of 
each medium as needed. Systems that employ such mixed mediums are called hybrid 
systems. 

A number of research and commercial groups have recognized that hybrid systems are 
necessary for some applications, and have constructed hybrids that combine multiple 
sensors including inertial, video, and GPS [Verplaetse, 1997 #300; Verplaetse, 1996 #234; 
Golding, 1999 #272; Azuma, 1998 #493; Azuma, 1999 #494; Azuma, 1999 #259; Azuma, 
1994 #255; Azuma, 1995 #338; Behringer, 1999 #496; You, 1999 #31 1; You, 1999 #312; 
Pasman, 1999 #292; Foxlin, 1998 #270; Foxlin, 1994 #131; Foxlin, 1996 #509]. 
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Inertial Tracking 



One of the most popular technologies (mediums) used to improve or augment the 
performance of other mediums is to incorporate accelerometers and gyros in some form of 
an inertial navigation system as in [Emura, 1994 #78; List, 1983 #421; Azuma, 1995 
#338; Azuma, 1994 #255; Azuma, 1999 #494; Azuma, 1999 #259]. The reason is that 
inertial navigation systems exhibit relatively low error at high frequencies and velocities, 
and are very responsive. This is due in part to the fundamental medium, and in part to the 
nature of inertial devices and our ability to sample their signals at a relatively high rate, 
typically on the order of thousands of samples per second. (Contrast this with magnetic, 
acoustic, and optical system can typically only be sampled hundreds of times per second.) 
Unfortunately as described in Section 3.1.2, they also exhibit high error at lower 
frequencies and velocities. At low velocities (very slow or no movement) one must in 
practice contend with pronounced bias and drift error (noise). As movement slows, such 
noise begins to grow with respect to the true signal, resulting in unbounded error growth. 

Two examples of inertial hybrids are inertial-acoustic, and inertial-optical. The most well 
known example of the former is the commercial system described by [Foxlin, 1998 #270] 
and marketed by Intersense [Intersense, 2000 #277]. This system seeks to overcome the 
temporal and spatial shortcomings of purely acoustic systems (Section 3.1.1) by adding a 
relatively fast and robust inertial system. Again, the inertial system could not be used 
alone, but in combination with the acoustic system one can cover a wider spectrum of 
motion and performance. As a side note, the system described in [Foxlin, 1998 #270] 
actually uses three mediums— it uses pulsed infrared light to aid in the timing (triggering) 
of acoustic signals. 

A second example is an inertial-optical 
hybrid. As with any hybrid, the 
complementary behavior of each system 
is leveraged to obtain more accurate and 
stable tracking information than either 
system alone. With an inertial system, 
bias and drift errors dominate (grow 
unbounded) during periods of slow 
movement. However, during such periods 
the error can be controlled by an optical 
system, which would typically exhibit its 
best behavior under such conditions. 
Conversely an optical typically performs 
worst during very rapid movement, 
precisely the conditions where the inertial signal-to-noise ratio is high (see Figure 3.10). 
In addition, while a typical vision-based optical system using multi-pixel cameras is 
affected by unrelated motion in an environment, an inertial system is not and can provide 
assistance with static visual feature discrimination. Figure 3.27 offers a qualitative 
comparison of the complementary relationship between the performance of inertial and 
optical (or acoustic) mediums. 




motion 

Figure 3.27: A qualitative comparison of 
inertial vs. optical or acoustic 
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4. Approaches 



4.1 Traditional Closed-Form Approaches 



We get the intuition from linear algebra that we need at least as many equations (or 
constraints) as there are unknowns to solve a system of equations. More equations might 
be required for a non-linear system. In this section we will examine a few solutions to 
particular tracking problems using this traditional approach. In Section 4.2.5 we examine 
a new approach that sequentially applies a single constraint at a time. 

The key thing to notice is the variety of the mathematical approaches. These are only a few 
of the many formulations for these problems. 

4.1.1 Range Trackers 

For 3D position tracking using range common arrangements are three fixed microphones 
and a single moving source or three fixed sources and a single moving microphone. 
Mathematically, we must solve three simultaneous sphere equations where we know the 
origin (e.g. x 0 , y 0 , z 0 ) and the radius (e.g. r Q ) for each sphere. 



The solution of these equations is very complicated but we can simplify it greatly by 
aligning the three microphones with the axes of the coordinate system as follows: 

a. put microphone 0 at the origin of the coordinate system 

b. put microphone 1 out the X axis by one unit 

c. put microphone 2 out the Y axis by one unit 

d. all three microphones are in the Z = 0 plane. 



* 0 ) 2 + (y 

*i) 2 + (y 

x 2 ) 2 + (y 



y 0 ) 2 + (z 
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(4.1) 
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This amounts to allowing the microphones to define the coordinate system rather than 
attempting to embed them in an existing coordinate system. We can, of course, get from 
these microphone coordinates to laboratory coordinates with a separate linear transform. 
The simplified equations are now: 

2 2 2 2 
x +y +z - r Q 

(Jt-l) 2 + / + z 2 = r\ (4.2) 

2 , ,\2 2 2 
x + {y- 1) +z = r 2 



And their solution is: 
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Notice that the sign ambiguity on z prevents us from determining which side of the plane 
the target is on. This is a common problem with nonlinear systems of equations. The three 
spheres actually intersect in 2 points as shown in Figure 3.1 on page 30. Usually some 
design constraint is used to restrict z to the positive or negative subspace. 

Another subtle point in this formulation and the others in this section is the assumption 
that the multiple measurements correspond to a single position of the target. For an 
acoustic tracker with fixed microphones and a moving source, this assumption is likely 
valid because the microphones respond to a single acoustic burst from the source. On the 
other hand, for a tracker with a moving microphone, this assumption is likely to be 
violated if the sources must be operated sequentially to avoid interference. In this case the 
microphone may have moved between successive measurements. Violating the assumption 
of simultaneous measurements results in measurement errors that vary with the target's 
rate of motion. See "The Simultaneity Assumption" on page 71 for further discussion. 

4.1.2 Optical Trackers with Fixed 2D Sensors 

Each camera in an optical system with 2D sensors that are fixed in laboratory coordinates 
determines a ray in 3D. The ray can be described using the parametric form with 
parameter for camera 1 and s 2 for camera 2. The parameters vary from 0 at the center 
of projection of a camera to infinity. The parametric equations are: 

A l =C l+Sl D 

(4.4) 

— *^2^2 
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Cjand C 2 are the centers of projection of the cameras. D x is the unit-length direction 
vector determined by the image of the target on the image plane of camera 1 . Likewise for 
D 2 . The baseline B = C 2 - C } is the vector between the centers of projection. 

These two rays in three space almost certainly do not intersect. Of course the exact two 
lines must intersect because they result from images of the same point in space. But errors 
in calibration of the cameras and in determining the imaged coordinates in each camera 
will result in line equations that likely do not intersect. 

We want to determine the point in 3D that is closest to both rays. We do this by finding the 
parameter values that minimize the distances between the lines. That is, we want to 
minimize 

\\(C 2 + s 2 D 2 )-(C l+ s ] D l )\\ (4.5) 

This equation is the length of the line joining the two rays. Since the shortest line joining 
the two rays must be perpendicular to each of the rays, it must be true that 

[(C 2 + s 2 D 2 )-(C l +s l D l )]'D l = 0 

(4.6) 

[{C 2 + s 2 D 2 )-{C x + s^)] *D 2 = 0 

This system of two equations in two unknowns has the solution 

(B-D l )-(D 2 -D ] )(B-D 2 ) 

s l = 2 

1-(D 1 *D 2 ) 

(4.7) 

{D X 'D 2 ){B'D X )-{B*D 2 ) 
S 2 = 2 

The point closest to the two rays is the midpoint of this shortest line segment 

P = — — X — — (4.8) 

Examining equation (4.7) we can see the source of the difficulty described in Section 3.1 .5 
when the sensors are close together relative to the distance to the target. In this case the 
direction vectors D x and D 2 will be nearly parallel. Thus their dot product will be nearly 
one and the denominator of the equations will grow very small. This will amplify the 
effect of small errors. 

4.13 Optical Trackers with Fixed ID Sensors 

The following approach may also be used with 2D sensors by considering that you are 
given four ID measurements rather than two 2D measurements. 
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By a calibration process we determine the coefficients that map the ID sensor coordinate 
to a plane in 3D. The plane for sensor i will produce an equation of the form 



A-x + B t y + C t z = D. 



(4.9) 



Where [x y y, z] T is the 3D coordinate of the tracked point. With three such linear 
equations we can use standard solution techniques to solve for the position. 
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With more than three sensors we can use least-squares solution methods to determine the 
position with the minimum squared error. 
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Analogously to the 2D case described earlier, if the target is far away relative to the 
distance between cameras, rows of the matrix M will be very similar resulting in near 
singularity and amplification of small errors. 

4.1.4 Optical Trackers with Moving Sensors 

Azuma and Ward [Azuma, 1991 #260] document the Space-Resection approach to solving 
for the position and orientation of the sensor cluster. This method was used in the first 
generation "Ceiling Tracker" at UNC. The HiBall tracker uses a much simpler method that 
will be described later. 

Their method is too complicated to describe here. They set up the system of non-linear 
equations that described the relationships among the known 3D coordinates of the LEDs, 
the known 2D image-plane coordinates for the sightings of the LEDs, and the unknown 
position and orientation of the camera cluster. They solved the non-linear system of 
equations using an iterative approach that required a good initial guess. During normal 
system operation the previously known pose was usually an excellent guess for the current 
pose and the iterative method converged rapidly. Initialization at system startup was 
accomplished by sequentially trying a small set of different orientations to see if any will 
converge to a likely solution. The convergence region of the algorithm was large enough 
that it could acquire the initial position within a few seconds when the tracker was held 
upright at about head height. 

42 Stochastic Approaches 

While there are many application-specific approaches to "computing" (estimating) the 
position and orientation or pose of an object (see Section 4.1), most of these methods do 
not inherently take into consideration the noisy nature of the sensor measurements. While 
the requirements for the pose information varies with application, the fundamental source 
of information is the same: pose estimates are derived from noisy electrical measurements 
of mechanical, inertial, optical, acoustic, or magnetic sensors. This noise is typically 
statistical in nature (or can be effectively modeled as such), which leads us to stochastic 
methods for addressing the problems. Here we provide a very basic introduction to the 
subject, primarily aimed at preparing the reader for the material in the appendices. For a 
more extensive discussion of stochastic estimation see for example [Kailath, 2000 #279; 
Lewis, 1986 #89]. 

42.1 State-Space Models 

State-space models are essentially a notational convenience for estimation and control 
problems, developed to make what would otherwise be a notationally-intractable analysis 
tractable. Consider a dynamic process described by an n-th order difference equation 
(similarly a differential equation) of the form 

y i+ 1 = *0, ,*i + - + a n- 1, iVi -n + 1 + W i ■ f * 0 > 
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where {wj is a zero-mean (statistically) white (spectrally) random "noise" process with 
autocorrelation 

E(w p wj) = Qfrj, 

and initial values {y 0 , y_j, y_ n + j } are zero-mean random variables with a known 
nx n covariance matrix 

P Q = E(y_j,y_ k ),j 9 ke{0,n-l}. 



Also assume that 



E(w t , y .) = 0 for - n + 1 =s j " =s 0 and j s> 0 , 



which ensures [Kailath, 2000 #279] that 

EiwpyJ = 0, i*j*0. 

In other words, that the noise is statistically independent from the process to be estimated. 
Under some other basic conditions [Kailath, 2000 #279] this difference equation can be 
re-written as 
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which leads to the state-space model 

+ 1 

= [l 0 ... o]*« 



or the more general form 



(4.12) 
(4.13) 



Equation (4.12) represents the way a new state jfc. , is modeled as a linear combination 
of both the previous state x ( - and some process noise w t . Equation (4.13) describes the 
way the process measurements or observations y ( - are derived from the internal state j^. . 
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These two equations are often referred to respectively as the process model and the 
measurement model, and they serve as the basis for virtually all linear estimation methods, 
such as the Kalman filter described below. 

4.2.2 The Observer Design Problem 

There is a related general problem in the area of linear systems theory generally called the 
observer design problem. The basic problem is to determine (estimate) the internal states 
of a linear system, given access only to the system's outputs} This is akin to what people 
often think of as the "black box" problem where you have access to some signals coming 
from the box (the outputs) but you cannot directly observe what's inside. 

The many approaches to this basic problem are typically based on the state-space model 
presented in the previous section. There is typically a process model that models the 
transformation of the process state. This can usually be represented as a linear stochastic 
difference equation similar to equation (4.12): 

x k = Ax k-i +Gvt V ( 4 - 14 ) 

In addition there is some form of measurement model that describes the relationship 
between the process state and the measurements. This can usually be represented with a 
linear expression similar to equation (4.13): 

Z k = Hx k + v k . (4.15) 

The terms w k and v k are random variables (see Section 2.2.1 on page 21) representing the 
process and measurement noise respectively. 

Measurement and Process Noise 

There are many sources of noise in sensor measurements. For example, each type of 
sensor has fundamental limitations related to the associated physical medium, and when 
pushing the envelope of these limitations the signals are typically degraded. In addition, 
some amount of random electrical noise is added to the signal via the sensor and the 
electrical circuits. The time- varying ratio of "pure" signal to the electrical noise 
continuously affects the quantity and quality of the information. The result is that 
information obtained from any one sensor must be qualified as it is interpreted as part of 
an overall sequence of pose estimates, and analytical measurement models typically 
incorporate some notion of random measurement noise or uncertainty v k as shown above 
in equation (4.15). 



1 . Access to the system's control inputs is also presumed, but less relevant in the case of tracking 
and motion capture, so we will omit that aspect. See for example [Kailath, 2000 #279] for more 
information. 
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In the case of tracking or motion capture of humans, there is the additional problem that 
the user's intended motion is essentially completely unknown. While we can make 
predictions over relatively short intervals using models based on recent motion as a guide 
(see Section 5.3), such predictions assume that the user's motion is predictable, which is 
not always the case. The result is that like sensor information, ongoing estimates of the 
user pose must be qualified as they are combined with measurements in an overall 
sequence of pose estimates. In addition, analytical motion or process models typically 
incorporate some notion of random motion or uncertainty w k as shown above in 
equation (4.14). 

42 3 Optimal Estimation— The Kalman Filter 

Among the substantial number of mathematical tools that can be used for stochastic pose 
estimation from noisy sensor measurements, one of the most well-known and often-used 
tools is what is known as the Kalman filter. The Kalman filter is named after Rudolph E. 
Kalman, who in 1960 published his famous paper describing a recursive solution to the 
discrete-data linear filtering problem [Kalman, 1960 #87]. 

The filter is essentially a set of mathematical equations that implement a predictor- 
corrector type estimator that is optimal in the sense that it minimizes the estimated error 
covariance— when some presumed conditions are met. Since the time of its introduction, 
the Kalman filter has been the subject of extensive research and application, particularly in 
the area of autonomous or assisted navigation. This is likely due in large part to advances 
in digital computing that made the use of the filter practical, but also to the relative 
simplicity and robust nature of the filter itself. Rarely do the conditions necessary for 
optimality actually exist, and yet the filter apparently works well for many applications in 
spite of this situation. 

Of particular note here, the Kalman filter has been used extensively for tracking in 
interactive computer graphics. We use a single-constraint-at-a-time Kalman filter (see 
page 66) in our HiBall Tracking System [Welch, 2001 #308; Welch, 1999 #307] which is 
commercially available from 3rdTech [3rdTech, 2000 #153]. It has also been used for 
motion prediction [Azuma, 1995 #64; Azuma, 1994 #255], and it is used for multi-sensor 
(inertial-acoustic) fusion in the commercial Constellation™ wide-area tracking system by 
Intersense [Foxlin, 1998 #270; Intersense, 2000 #277]. See also [Azarbayejani, 1994 
#336; Emura, 1994 #78; Emura, 1994 #505; Fuchs (Foxlin), 1993 #271; Mazuryk, 1995 
#284; Van Pabst, 1993 #535]. 

We maintain a popular web site on the topic of the Kalman filter. The web address is 
http://www.cs.unc.edu/-welch/kalman/. 

On this site you will find references to (and some copies of) introductory and advanced 
material on the Kalman filter. New for 2001— we will be bringing on-line a Java-based 
Kalman Filter Learning Tool. In addition, we are also teaching an Introduction to the 
Kalman Filter two-hour tutorial at SIGGRAPH 2001 (Course 8), for which there are 
separate course notes. Finally, in Appendix A of this course pack (page 79) we have 
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included a copy of our own introductory technical report. Beyond this Appendix, a very 
"friendly" introduction to the general idea of the Kalman filter can be found in Chapter 1 
of [Maybeck, 1979 #93]— which is available from the above Kalman filter web site, while 
a more complete introductory discussion can be found in [Sorenson, 1970 #101], which 
also contains some interesting historical narrative. More extensive references include 
[Gelb, 1974 #82; Grewal, 2001 #245; Maybeck, 1979 #93; Lewis, 1986 #89; Jacobs, 1993 
#86; Brown, 1996 #246]. 

4.2.4 Hybrid or Multi-Sensor Fusion 

Stochastic estimation tools such as the Kalman filter (see Appendix A on page 79) can be 
used to combine or fuse information from different mediums or sensors for hybrid systems 
(see Section 3.3 on page 54). The basic idea is to use the Kalman filter to weight the 
different mediums most heavily in the circumstances where they each perform best, thus 
providing more accurate and stable estimates than a system based on any one medium 
alone. In particular, the indirect feedback Kalman filter shown in Figure 4.1 (also called a 
complementary or error-state Kalman filter) is often used to combine the two mediums 
[Maybeck, 1979 #93]. In such a configuration, the Kalman filter is used to estimate the 
difference between the current inertial and optical (or acoustic) outputs, i.e. it continually 
estimates the error in the inertial estimates by using the optical system as a second 
(redundant) reference. This error estimate is then used to correct the inertial estimates. The 
adjustment or tuning of the Kalman filter parameters then determines the weight of the 
correction as a function of frequency. By slightly modifying the Kalman filter, adaptive 
velocity response can be incorporated also. This can be accomplished by adjusting (in real 
time) the expected optical measurement error as a function of the magnitude of velocity. 
The dashed line in Figure 4.1 indicates the additional use of inertial estimates to help a 
image-based optical system to prevent tracking of moving scene objects (i.e. unrelated 
motion in the environment). 
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Figure 4.1: The Kalman filter used in an indirect-feedback 
configuration to optimally weight inertial and optical information. 



In such a configuration, the Kalman filter uses a common process model, but a distinct 
measurement model for each of the inertial and optical subsystems. See Appendix A on 
page 79 for more information about these models. 
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4.2.5 Single-Constraint-at-a-Time Tracking 

A conventional approach to pose estimation is to collect a group of sensor measurements 
and then to attempt to simultaneously solve a system of equations that together completely 
constrain the solution. For example, the 1991 UNC-Chapel Hill wide-area opto-electronic 
tracking system [Ward, 1992 #540; Wang, 1990 #301] collected a group of diverse 
measurements for a variety of LEDs and sensors, and then used a method of simultaneous 
non-linear equations called Collinearity to estimate the pose of the head-worn sensor 
fixture [Azuma, 1991 #260]. There was one equation for each measurement, expressing 
the constraint that a ray from the front principal point of the sensor lens to the LED, must 
be collinear with a ray from the rear principal point to the intersection with the sensor. 
Each estimate made use of typically 20 (or more) measurements that together over- 
constrained the solution. 

This multiple constraint method had several drawbacks. First, it had a significantly lower 
estimate rate due to the need to collect multiple measurements per estimate. Second, the 
system of non-linear equations did not account for the fact that the sensor fixture 
continued to move throughout the collection of the sequence of measurements. Instead the 
method effectively assumes that the measurements were taken simultaneously. The 
violation of this simultaneity assumption could introduce significant error during even 
moderate motion. Finally, the method provided no means to identify or handle unusually 
noisy individual measurements. Thus, a single erroneous measurement could cause an 
estimate to jump away from an otherwise smooth track. 

In contrast, there is typically nothing about solutions to the observer design problem in 
general (Section 4.2.2), or the Kalman filter in particular (Section 4.2.3), that dictates the 
ordering of measurement information. In 1996 we introduced a new Kalman filter-based 
approach to tracking, an approach that exploits this flexibility in measurement processing. 
The basic idea is to update the pose estimate as each new measurement is made, rather 
than waiting to form a complete collection of measurement. Because single measurements 
under-cons train the mathematical solution, we refer to the approach as single-constraint- 
at-a-time or SCAAT tracking [Welch, 1997 #304; Welch, 1996 #303]. The key is that the 
single measurements provide some information about the tracker state, and thus can be 
used to incrementally improve a previous estimate. We intentionally fuse each individual 
"insufficient" measurement immediately as it is obtained. With this approach we are able 
to generate estimates more frequently, with less latency, with improved accuracy, and we 
are able to estimate the LED positions on-line concurrently while tracking. This approach 
is used in our laboratory-based HiBall Tracking System [Welch, 2001 #308; Welch, 1999 
#307], the commercial version of the same system [3rdTech, 2000 #153], and the 
commercial systems manufactured by Intersense [Foxlin, 1998 #270; Intersense, 2000 
#277]. 

One of the most interesting things about the SCAAT approach is that it can be almost 
universally applied in place of conventional approaches (Section 4.1). Essentially all you 
need is to be able to predict a sensor measurement given a current pose estimate. In other 
words, if you can formulate a measurement model as in equation (4.15) in Section 4.2.2, 



66 



Course 11— Tracking: Beyond 15 Minutes of Thought 



you can use the SCAAT approach. In fact, one of the unusual things about the approach is 
that you never actually compute the pose directly, you only compute the measurement you 
think the sensor should "see." For computer graphics people, the measurement model for 
optical sensors in particular is "simple" as it typically resembles the normal graphics 
viewing transformations. 

Consider for a moment the UNC hybrid landmark-magnetic tracker presented at 
SIGGRAPH 96 [State, 1996 #469]. This system uses an off-the-shelf Ascension magnetic 
tracking system along with a vision-based landmark recognition system to achieve 
superior synthetic and real image registration for augmented reality assisted medical 
procedures. The vision-based component attempts to identify and locate multiple known 
landmarks in a single image before applying a correction to the magnetic readings. A 
SCAAT implementation would instead predict the location of a landmark in the image, 
then identify and locate that single landmark in the actual image. It would process one 
landmark per image update in this fashion. Not only would this approach increase the 
frequency of landmark-based correction (given the necessary image processing) but it 
would offer the added benefit that unlike the implementation presented in [State, 1996 
#469], no special processing would be needed for the cases where the number of visible 
landmarks falls below the number necessary to determine a complete position and 
orientation solution. The SCAAT implementation would simply cycle through any 
available landmarks, one at a time. Even with only one visible landmark the method would 
continue to operate as usual, using the information provided by the landmark sighting to 
refine the estimate. 

For more information see [Welch, 1997 #304] and [Welch, 1996 #303], the latter of which 
is included at the end of this course pack in Appendix C. 
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5. Problems and Insights 

In this chapter we consider some of the primary sources of error in estimates from tracking 
and motion capture systems. While we focus specifically on head tracking for interactive 
computer graphics, the basic principles are applicable to tracking of hands, and even to 
motion capture. We attempt to provide some insight into the source of the error, and even a 
means for addressing (not necessarily solving) one of the biggest problems: end-to-end 
delay in the entire tracking and graphics pipeline. 

5.1 Classification of Error 

There are of course many causes of visual error in interactive computer graphics systems. 
There are many people (aside from the authors) who would argue that various errors 
originating in the tracking system dominate all other sources. In his 1995 Ph.D. 
dissertation thoroughly analyzing the sources of error in an Augmented Reality system for 
computer-aided surgery, Rich Holloway stated 

Clearly, the head tracker is the major cause of registration error in AR systems. The 
errors come as a result of errors in aligning the tracker origin with respect to the World 
CS (which may be avoidable), measurement errors in both calibrated and 
multibranched trackers, and delay in propagating the information reported by the 
tracker through the system in a timely fashion. 

Rich Holloway 's dissertation offers a very thorough look at the sources of error in the 
entire AR pipeline, including the stages associated with tracking. Much of the dissertation 
is applicable to VR systems in general, and even motion capture. We highly encourage you 
to take a look if you are really interested in a rigorous mathematical analysis. Chapter 8 of 
the dissertation discusses some methods for combating the problems introduced by tracker 
error, in particular delay. The dissertation is available from http://www.cs.unc.edu/ 
Publications/Dissertations .html . 

Sources of Error 

For a person designing, calibrating, or using a tracking or motion capture system, it is 
useful to have some insight into where errors come from. As [Deering, 1992 #552] notes, 
"...the visual effect of many of the errors is frustratingly similar." This is especially true for 
tracking errors. We have seen people build VR applications with obvious head tracker 
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transformation errors, and yet people had great difficulty figuring out what part of the long 
sequence of transforms was wrong, if it was a static calibration error, or a simple sign 
error. 

Yet even when all of the transforms are of the correct form, the units of translation and 
orientation match, and all the signs are correct, there are still unavoidable errors in motion 
tracking, errors that confound even the most experienced of practitioners of interactive 
computer graphics. No matter what the approach (see Chapter 4), the process of pose 
estimation can be thought of as a sequence of events and operations. The sequence begins 
with the user motion, and typically ends with a pose estimate arriving at the host 
computer, ready to be consumed by the application. Clearly by the time a pose estimate 
arrives at the host computer it is already "late"— and you still have to render an image and 
wait for it to be displayed! Section 5.3 offers some hope for addressing the long delays and 
in some sense "catching up" with the user motion, but that doesn't mean that we don't 
want to minimize the delay, and to understand how all of the various errors affect the 
outcome. 

The sources of error in tracking and motion capture can generally be divided into two 
primary classes. The first includes all errors related to making static measurements, either 
off line prior to running an application, or on line during normal operation. We call this 
static measurement error. The second includes all errors that arise from the inevitable 
sources of delay in the tracking pipeline. We call this delay-induced error. 

5.1.1 Static Measurement Error 

Static Field Distortion 

For an immobile sensor (static motion), we can divide the measurement errors into two 
types: repeatable and nonrepeatable . Some trackers (for example, magnetic ones) have 
systematic, repeatable distortions of their measurement volume which cause them to give 
erroneous data; we will call this effect static field distortion. The fact that these 
measurement errors are repeatable means that they can be measured and corrected as long 
as they remain unchanged between this calibration procedure and run time. 

Random Noise or Jitter 

Here we consider the non-repeatable errors made by the tracker for an immobile sensor. 
As we discussed in general in Section 2.1.1, some amount of noise in the sensor inputs is 
inevitable with any measurement system, and this measurement noise typically leads to 
random noise or jitter in the pose estimates. By our definition, this type of error is not 
repeatable and therefore not correctable a priori via calibration. Moreover, the amount of 
jitter in the tracker's outputs limits the degree to which the tracker can be calibrated. The 
amount of jitter is often proportional to the distance between the sensor(s) and the 
source(s), and may become relatively large near the edge of the tracker's working volume. 
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5.1.2 Delay-Induced Error 

Any measurement of a non-repeating, time- varying phenomenon is valid (at best) at the 
instant the sample occurs— or over the brief interval it occurs, and then becomes "stale" 
with the passage of time until the next measurement. The age of the data is thus one factor 
in its accuracy. Any delay between the time the measurement is made and the time that 
measurement is manifested by the system in a pose estimate contributes to the age and 
therefore the inaccuracy of that measurement. The older the tracker data is, the more likely 
that the displayed image will be misaligned with the real world. 

We feel that concerns related to dynamic error (including dynamic tracker error and 
delay-induced error from above) deserve distinct discussion. This class of error is often 
less obvious when it occurs (you know something isn't correct, but you don't know why), 
and when you do recognize it, it is difficult to know where to look to minimize the effects. 

First-Order Dynamic Error 

Probably the most obvious effect here is the overall dynamic error caused by continued 
user motion after a tracker cycle (sample, estimate, produce) has started. If the user's head 
is rotating with an angular velocity of 6 and translating with a linear velocity of x then 
simple first-order models for the delay-induced orientation and translation error are given 
by 

e dynj e = 9Af (5.1) 
S d yn,* = * A < ( 5 - 2 ) 



where At is the sum of the total motion delay At m for the tracking system as described 
below in Section 5.2, as well as At g , the delay through the remainder of the graphics 
pipeline— including rendering and image generation, video synchronization delay, frame 
synchronization delay, and internal display delay. The video synchronization delay is the 
amount of time spent waiting for a frame buffer to swap— on average 1/2 the frame time. 
(Synchronization delay in general is described more below.) The internal display delay is 
any delay added by the display device beyond the normal frame delay. For example, some 
LCD and DLP devices buffer images internally in a non-intuitive manner. The delay must 
be measured on a per-device basis if it is important. 

The Simultaneity Assumption 

Many popular tracking systems collect sensor measurements sequentially, and then 
assume (mathematically) that they were collected simultaneously. We refer to this as the 
simultaneity assumption. If the target remains motionless this assumption introduces no 
error. However if the target is moving, the violation of the assumption introduces error. 
Consider that typical arm and wrist motion can occur in as little as 1/2 second, with typical 
"fast" wrist tangential motion occurring at three meters per second [Atkeson, 1985 #556]. 
For the a typical magnetic tracker with 20-80 ms of latency, such "fast" motion 
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corresponds to approximately one to ten centimeters of translation throughout the 
sequence of sensor samples used for a single estimate. For systems that attempt sub- 
millimeter accuracies, even slow motion occurring during a sequence of sequential 
samples impacts the accuracy of the estimates. For example, in a multiple-sample system 
with Afy = 30 [ms] of total sample time, motion of only three centimeters per second 
corresponds to approximately one millimeter of target translation throughout the sequence 
of samples for one estimate. Figure 5.1 presents the results of a simulation from [Welch, 
1996 #303] (page 161) which includes a more extensive analysis of this error source. 
Figure 5.1 shows how estimates can be pulled away from the truth by an error amount of 
E sa as the simultaneity assumption is violated. 
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Figure 5.1: Simulated error resulting from the simultaneity assumption. 
The family of curves shows how simulated position estimates become 
skewed by the simultaneity assumption as a target undergoes motion with a 
one Hertz sinusoidal velocity. Note the increasing skew of the estimates 
with total sensor sample times of At s E {0, 10, 40, 70, 100} ms. Details 
appear in [Welch, 1996 #303] . 



Sensor Sample Rate 

Per Shannon's sampling theorem [Jacobs, 1993 #86] the measurement or sampling rate 
r ss should be at least twice the true target motion bandwidth, or an estimator may track an 
alias of the true motion. Given that common arm and head motion bandwidth 
specifications range from 2 to 20 Hz [Fischer, 1990 #555; Foxlin, 1993 #372; Neilson, 
1972 #554], the sampling rate should ideally be greater than 40 Hz. Furthermore, the 
estimation rate r should be as high as possible so that slight (expected and acceptable) 
estimation error can be discriminated from the unusual error that might be observed 
during times of significant target dynamics. 
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Synchronization Delay 

While other latencies (delays) certainly do exist in the typical VE system [Council, 1994 
#358; Wloka, 1995 #557; Mine, 1993 #433] tracker latency is unique in that it determines 
how much time elapses before the first possible opportunity to respond to user motion. 
When the user moves, we want to know as soon as possible. Within the tracking system 
pipeline of events (and throughout the rendering pipeline) there are both fixed latencies 
associated with well-defined tasks such as executing functions to compute the pose, and 
variable latencies associated with the synchronization between well-defined asynchronous 
tasks. The latter is often called synchronization delay, although sometimes also phase 
delay or rendezvous delay. See for example Figure 5.2. 

synchronization 

delay ~\ a * fixed-time A 
I * I processing t j 

estimate .|. - ^s/y//))//////\ i"- 

II 1 I I II I 



measure 



time 



Figure 52: Synchronization delay. A measurement is taken at a but 
not used to estimate the pose until a' . The intervening time is called 
synchronization delay. 



In the example of Figure 5.2, measurements and pose estimates occur at regular but 
different rates. Inevitably, any measurement will sit for some time before being used in to 
compute a pose estimate. At best, the measurement will be read immediately after it is 
made. At worst the measurement will be read just before it is replaced with a newer 
measurement. On average the delay would be 1 / 2 the measurement rate. 

5.2 Total Tracker Error 

Figure 5.3 presents a more involved example, a sequence of inter- tracker events and the 
corresponding delays. Consider an instantaneous step-like user motion as depicted in 
Figure 5.3. The sequence of events begins at t , the instant the user begins to move. In 
this example the sensors are sampled at a regular rate r = 1/x , such as would 
typically be the case with video or a high-speed A/D conversion. On average there will be 
kt ss - * ss /2 seconds of sample synchronization delay before any sample is used for 
pose estimation. Because the pose estimate computations are repeated asynchronously at 
the regular rate of r e = l/x^ there will be an average of At € = x^/2 seconds of 
estimation synchronization delay, after which time the estimation will take x e seconds. 
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Assuming a client-server architecture such as [VRPN, 2001 #558] the final estimate will 
be written to a server communications buffer where it is being read at a rate of 
r b = and will therefore wait an average of &t srb = seconds before 

being read and transmitted over the network to the client. The network transmission itself 
will take x , and the final client read-buffer synchronization delay will take 
At crb = x cr ^/2 seconds, where x crb = l/r crb (the client read-buffer rate). The total 
(average) motion delay in this example is then 



At = / , - / 
m m m 

= ^ss + ^e +X e + ^srb +X net + ^crb 

111 1 

+ —■ — + T„ + +T„„. + 



(5.3) 



where r is the sensor sample rate, r is the estimate rate, % e - 1 /r , r srb is the server 
read-buffer rate, % net is the network transmission time, and r crb is the client read-buffer 
rate. 

Note that this bound does not include any latency inherently added by pose estimate 
computations that also implement some form of filtering. 

Summing the static measurement error from Section 5.1.1, the error e' caused by 
violation of the simultaneity assumption, and the dynamic error given by equations (5.1) 
and (5.2), we get a total error of 

8 stat, 6 sa,0 v m g' (5 4) 

£ » £ ♦ * +£ +Jc(Af + At ) 
°jc stat, x sa, x K m "*g ; 

where At m is from equation (5.3), and includes the remainder of the graphics 
pipeline delay as described in "First-Order Dynamic Error" above in Section 5.1.2. 
Clearly the final rotation and translation error is sensitive to both the user motion velocity, 
and the total delay of the tracker and graphics pipeline. 



74 



Course 1 1 — Tracking: Beyond 15 Minutes of Thought 



read 
buffer 



write 
buffer 



network - 



read 
buffer 



write 
buffer 



estimate 



sample 
sensor 



user motion 







H--H--I 




□ 



m 



time 

Figure 53: An example sequence of inter-tracker events and delays. 



53 Motion Prediction 

When trackers are used to implement VE or AR systems, end-to-end delays the total 
system will result in perceived swimming of the virtual world whenever the user's head 
moves. The delay causes the virtual objects to appear to follow the user's head motion 
with a velocity dependent error. 
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The sequence of events in a head-mounted display system goes something like this: 



Time 


Event 


'o 


tracker measures user's pose 




tracker reports the pose 


h 


application receives the reported pose 


h 


updated image is ready in the hidden buffer of a double-buffered display 


U 


buffer swap happens at vertical interval 


*5 


image is scanned out to the display 



Table 5.1: Time series of events in a head-mounted display system. 



The interval from t 0 to t 5 is on the order of 30ms in the fastest systems and upwards to 
200ms in the slowest. If the user is moving during this interval the image finally displayed 
at t 5 will not be appropriate for the user's new position. We are displaying images 
appropriate for where the user was rather than for where he is. 

The most important step in combating this swimming is to reduce the end-to-end delay. 
This process can be taken only so far though. Each of the steps takes some time and this 
time is not likely to be reduced to negligible simply by accelerating the hardware. 

After the avoidable delays have been eliminated we can mitigate the effect of the 
unavoidable delays by using motion prediction. Our goal is to extrapolate the user's past 
motion to predict where he will be looking at the time the new image is ready. As Azuma 
[Azuma, 1995 #64] points out, this is akin to driving a car by looking only the rear- view 
mirror. To keep the car on the road, the driver must predict where the road will go, based 
solely on the view of the past and knowledge of roads in general. The difficulty of this task 
depends on how fast the car is going and on the shape of the road. If the road is straight 
and remains so, then the task is easy. If the road twists and turns unpredictably, the task 
will be impossible. 

Motion predictors attempt to extract information from past measurements to predict future 
measurements. Most methods, at their core, attempt to estimate the local derivatives so 
that a Taylor series can be evaluated to estimate the future value. The differences among 
methods are mostly in the type and amount of smoothing applied to the data in estimating 
the derivatives. 

The simplest approach simply extends a line through the previous two measurements to 
the time of the prediction. This approach will be very sensitive to noise in the 
measurements. More sophisticated approaches will take weighted combinations of several 
previous measurements. This will reduce sensitivity to noise but will incur a delay in 
responding to rapid changes. All methods based solely on past measurements of position 
and orientation will face a trade off between noise and responsiveness. 
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Performance of the predictor can be improved considerably if direct measurements of the 
derivatives of motion are available from inertial sensors. As described earlier, linear 
accelerometers and rate gyros provide estimates of the derivatives of motion with high 
bandwidth and good accuracy. Direct measurements are superior to differentiating the 
position and orientation estimates because they are less noisy and are not delayed. 

Azuma [Azuma, 1994 #255] demonstrated prediction using inertial sensors that reduced 
swimming in an augmented reality system by a factor of 5 to 10 with end-to-end delay of 
80 [ms]. Further in [Azuma, 1995 #256] he shows that error in predictions based on 
derivatives and simple models of motion are related to the square of the product of the 
prediction interval and the bandwidth of the motion sequence. Doubling the prediction 
interval for the same sort in input will quadruple the error. 
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A. An Introduction to the Kalman Filter 

This appendix is a copy of UNC technical report TR 95-041 , written by 
Welch and Bishop in 1995. It is included to provide a ready and accessible 
introduction to both the discrete Kalman filter and the extended Kalman fil- 
ter. This report and other useful material can be found at the authors' Kal- 
man filter web site, http://www.es .unc.edu/~welch/kalman/. 



A.l The Discrete Kalman Filter 



In 1960, R.E. Kalman published his famous paper describing a recursive solution to the 
discrete-data linear filtering problem [Kalman60]. Since that time, due in large part to 
advances in digital computing, the Kalman filter has been the subject of extensive research 
and application, particularly in the area of autonomous or assisted navigation. A very 
"friendly" introduction to the general idea of the Kalman filter can be found in Chapter 1 
of [Maybeck79], while a more complete introductory discussion can be found in 
[Sorenson70], which also contains some interesting historical narrative. More extensive 
references include [Gelb74; Grewal93; Maybeck79; Lewis86; Brown92; Jacobs93]. 

A.1.1 The Process to be Estimated 

The Kalman filter addresses the general problem of trying to estimate the state x E 9t n of 
a discrete-time controlled process that is governed by the linear stochastic difference 
equation 

x k = Ax k _ x +Bu k + w k _ l , (A.l) 



with a measurement z E JR m that is 



z k = Hx k + v k . (A.2) 



The random variables w k and v k represent the process and measurement noise 
(respectively). They are assumed to be independent (of each other), white, and with 
normal probability distributions 

p(w)~tf(0,G), ( A3 > 
p(v)~N(0,R). (A.4) 
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In practice, the process noise covariance Q and measurement noise covariance R 
matrices might change with each time step or measurement, however here we assume they 
are constant. 

The n x n matrix A in the difference equation equation (A.l) relates the state at the 
previous time step k - 1 to the state at the current step k , in the absence of either a driving 
function or process noise. Note that in practice A might change with each time step, but 
here we assume it is constant. The n x / matrix B relates the optional control input u E di 
to the state x. The m x n matrix H in the measurement equation equation (A.2) relates the 
state to the measurement z^. In practice H might change with each time step or 
measurement, but here we assume it is constant. 

A.l .2 The Computational Origins of the Filter 

We define x k E fft n (note the "super minus") to be our a priori state estimate at step k 
given knowledge of the process prior to step k, and x k £ to be our a posteriori state 
estimate at step k given measurement z k . We can then define a priori and a posteriori 
estimate errors as 

e k = x k -x k ,and 
e k = x k -x k . 

The a priori estimate error covariance is then 

and the a posteriori estimate error covariance is 

P k = E[e k e?]. (A.6) 

In deriving the equations for the Kalman filter, we begin with the goal of finding an 
equation that computes an a posteriori state estimate x k as a linear combination of an a 
priori estimate x k and a weighted difference between an actual measurement z k and a 
measurement prediction Hx k as shown below in equation (A.7). Some justification for 
equation (A.7) is given in "The Probabilistic Origins of the Filter" found below. 

x k = x k + K(z k -Hx k ) (A.7) 

The difference (z k - Hx k ) in equation (A.7) is called the measurement innovation, or the 
residual. The residual reflects the discrepancy between the predicted measurement Hx k 
and the actual measurement z k • A residual of zero means that the two are in complete 
agreement. 
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The n x m matrix K in equation (A.7) is chosen to be the gain or blending factor that 
minimizes the a posteriori error covariance equation (A. 6). This minimization can be 
accomplished by first substituting equation (A.7) into the above definition for e k , 
substituting that into equation (A. 6), performing the indicated expectations, taking the 
derivative of the trace of the result with respect to K, setting that result equal to zero, and 
then solving for K. For more details see [Maybeck79; Brown92; Jacobs93]. One form of 
the resulting K that minimizes equation (A .6) is given by 1 

K k = P k H T (HP k H T + R)~ l 

P k H T • (A.8). 

HP k H T + R 

Looking at equation (A.8) we see that as the measurement error covariance R approaches 
zero, the gain K weights the residual more heavily. Specifically, 

lim K h = H~ l . 

On the other hand, as the a priori estimate error covariance P k approaches zero, the gain 
K weights the residual less heavily. Specifically, 

lim K k = 0 . 

Another way of thinking about the weighting by K is that as the measurement error 
covariance R approaches zero, the actual measurement z k is "trusted" more and more, 
while the predicted measurement Hx k is trusted less and less. On the other hand, as the a 
priori estimate error covariance P k approaches zero the actual measurement z k is trusted 
less and less, while the predicted measurement Hx k is trusted more and more. 

A.13 The Probabilistic Origins of the Filter 

The justification for equation (A.7) is rooted in the probability of the a priori estimate x k 
conditioned on all prior measurements z k (Bayes' rule). For now let it suffice to point out 
that the Kalman filter maintains the first two moments of the state distribution, 

E[x k ] = x k 



1 . All of the Kalman filter equations can be algebraically manipulated into to several forms. 
Equation equation (A.8) represents the Kalman gain in one popular form. 
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The a posteriori state estimate equation (A .7) reflects the mean (the first moment) of the 
state distribution— it is normally distributed if the conditions of equation (A. 3) and 
equation (A .4) are met. The a posteriori estimate error covariance equation (A .6) reflects 
the variance of the state distribution (the second non-central moment). In other words, 

p(x k \z k ) ~ N(E[x k l E[{x k - x k )(x k - x k ) T ]) 
= N(x k ,P k ). 

For more details on the probabilistic origins of the Kalman filter, see [Maybeck79; 
Brown92; Jacobs93]. 

A. 1.4 The Discrete Kalman Filter Algorithm 

We will begin this section with a broad overview, covering the "high-level" operation of 
one form of the discrete Kalman filter (see the previous footnote). After presenting this 
high-level view, we will narrow the focus to the specific equations and their use in this 
version of the filter. 

The Kalman filter estimates a process by using a form of feedback control: the filter 
estimates the process state at some time and then obtains feedback in the form of (noisy) 
measurements. As such, the equations for the Kalman filter fall into two groups: time 
update equations and measurement update equations. The time update equations are 
responsible for projecting forward (in time) the current state and error covariance 
estimates to obtain the a priori estimates for the next time step. The measurement update 
equations are responsible for the feedback— i.e. for incorporating a new measurement into 
the a priori estimate to obtain an improved a posteriori estimate. 

The time update equations can also be thought of as predictor equations, while the 
measurement update equations can be thought of as corrector equations. Indeed the final 
estimation algorithm resembles that of a predictor-corrector algorithm for solving 
numerical problems as shown below in Figure A.l . 
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r > 

Time Update Measurement Update 
("Predict") ("Correct") 

Figure A.1: The ongoing discrete Kalman filter cycle. The 
time update projects the current state estimate ahead in 
time. The measurement update adjusts the projected 
estimate by an actual measurement at that time. 

The specific equations for the time and measurement updates are presented below in 
table A. 1 and table A. 2. 

Table A.1: Discrete Kalman filter time update equations. 



P\ = AP k _ } A T + Q (A.10) 

Again notice how the time update equations in table A.1 project the state and covariance 
estimates forward from time step k - 1 to step k . A and B are from equation (A.1), while 
Q is from equation (A. 3). Initial conditions for the filter are discussed in the earlier 
references. 

Table A 2: Discrete Kalman filter measurement update equations. 
K k = P k H T (HP k H T + R)~ l (A.ll) 

h = ** + K k^k~ H h) < A - 12 ) 

P k = (l-K k H)P k (A.13) 

The first task during the measurement update is to compute the Kalman gain, K k . Notice 
that the equation given here as equation (A.1 1) is the same as equation (A.8). The next 
step is to actually measure the process to obtain z k , and then to generate an a posteriori 
state estimate by incorporating the measurement as in equation (A. 12). Again 
equation (A. 12) is simply equation (A.7) repeated here for completeness. The final step is 
to obtain an a posteriori error covariance estimate via equation (A.13). 
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After each time and measurement update pair, the process is repeated with the previous a 
posteriori estimates used to project or predict the new a priori estimates. This recursive 
nature is one of the very appealing features of the Kalman filter— it makes practical 
implementations much more feasible than (for example) an implementation of a Wiener 
filter [Brown92] which is designed to operate on all of the data directly for each estimate. 
The Kalman filter instead recursively conditions the current estimate on all of the past 
measurements. Figure A .2 below offers a complete picture of the operation of the filter, 
combining the high-level diagram of Figure A.l with the equations from table A.l and 
table A.2. 




Time Update ("Predict") 



(1) Project the state ahead 

x k = Ax k _ r 



Bu,, 



(2) Project the error covariance ahead 

p k = AP k-i AT+ Q 



T 



(1) Compute the Kalman gain 



-1 



K k = P k H T (HP k H T + R) 

(2) Update estimate with measurement 

(3) Update the error covariance 

P k = (I-K k H)P k 



Initial estimates for x k _ j and P k _ j 

Figure A 2: A complete picture of the operation of the Kalman filter, combining the 
high-level diagram of Figure A.l with the equations from table A.l and table A.2. 



A.2 The Extended Kalman Filter (EKF) 



A2.1 The Process to be Estimated 



As described above in Section A. 1.1, the Kalman filter addresses the general problem of 
trying to estimate the state x E 9t of a discrete-time controlled process that is governed 
by a linear stochastic difference equation. But what happens if the process to be estimated 
and (or) the measurement relationship to the process is non-linear? Some of the most 
interesting and successful applications of Kalman filtering have been such situations. A 
Kalman filter that linearizes about the current mean and covariance is referred to as an 
extended Kalman filter or EKF. 

In something akin to a Taylor series, we can linearize the estimation around the current 
estimate using the partial derivatives of the process and measurement functions to 
compute estimates even in the face of non-linear relationships. To do so, we must begin by 
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modifying some of the material presented in Section A.l. Let us assume that our process 
again has a state vector x G dl n , but that the process is now governed by the non-linear 
stochastic difference equation 

x k = f( x k-V U k> W k-0> < A - 14 ) 

with a measurement z 6E Si™ that is 

z k = h(x k , v k ) , (A.15) 

where the random variables w k and v k again represent the process and measurement 
noise as in equation (A.3) and equation (A.4). In this case the non-linear function / in the 
difference equation equation (A. 14) relates the state at the previous time step k - 1 to the 
state at the current time step k . It includes as parameters any driving function u k and the 
zero-mean process noise w k . The non-linear function h in the measurement equation 
equation (A.15) relates the state x k to the measurement z k . 

In practice of course one does not know the individual values of the noise w k and v k at 
each time step. However, one can approximate the state and measurement vector without 
them as 

and 

l k = *(**;(>), (A.17) 

where x k is some a posteriori estimate of the state (from a previous time step k). 

It is important to note that a fundamental flaw of the EKF is that the distributions (or 
densities in the continuous case) of the various random variables are no longer normal 
after undergoing their respective nonlinear transformations. The EKF is simply an ad hoc 
state estimator that only approximates the optimality of Bayes' rule by linearization. Some 
interesting work has been done by Julier et al. in developing a variation to the EKF, using 
methods that preserve the normal distributions throughout the non-linear transformations 
[Julier96]. 

A.22 The Computational Origins of the Filter 

To estimate a process with non-linear difference and measurement relationships, we begin 
by writing new governing equations that linearize an estimate about equation (A.l 6) and 
equation (A.17), 

* k ~~x k + A(x k _\-x k _\) + W^it-i > (A. 18) 

+ Vv k . (A.19) 
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where 

• x k and z k are the actual state and measurement vectors, 

• x k and z k are the approximate state and measurement vectors from 
equation ( A . 1 6) and equation ( A . 1 7) , 

• x k is an a posteriori estimate of the state at step k, 

• the random variables w k and v k represent the process and measurement noise 
as in equation (A. 3) and equation (A.4). 

• A is the Jacobian matrix of partial derivatives of / with respect to x, that is 

• W is the Jacobian matrix of partial derivatives of / with respect to u>, 

• His the Jacobian matrix of partial derivatives of h with respect to x, 

dh fn 

"[U] = ^*>°>' 

• V is the Jacobian matrix of partial derivatives of h with respect to v, 

Note that for simplicity in the notation we do not use the time step subscript k with the 
Jacobians A 9 W 9 H 9 and V , even though they are in fact different at each time step. 

Now we define a new notation for the prediction error, 

~e Xk = x k -x k , (A.20) 

and the measurement residual, 
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Remember that in practice one does not have access to x k in equation (A. 20), it is the 
actual state vector, i.e. the quantity one is trying to estimate. On the other hand, one does 
have access to z k in equation (A.21), it is the actual measurement that one is using to 
estimate x k . Using equation (A .20) and equation (A.21) we can write governing equations 
for an error process as 

e* t ~ A( **-i - **-i )+£ *' (A - 22) 



where z k and r\ k represent new independent random variables having zero mean and 
covariance matrices WQW T and VRV T , with Q and R as in (A. 3) and (A.4) 
respectively. 

Notice that the equations equation (A.22) and equation (A.23) are linear, and that they 
closely resemble the difference and measurement equations equation (A.l) and 
equation (A.2) from the discrete Kalman filter. This motivates us to use the actual 
measurement residual e in equation (A.21) and a second (hypothetical) Kalman filter to 
estimate the prediction error e given by equation (A.22). This estimate, call it e k , could 
then be used along with equation (A. 20) to obtain the a posteriori state estimates for the 
original non-linear process as 

x k = x k + e k . (A.24) 

The random variables of equation (A.22) and equation (A.23) have approximately the 
following probability distributions (see the previous footnote): 

P Ce Xk )~N(0,E[~e x eT k ]) 

p(* k )~N(0,WQ k W T ) 

p(y\ k )~N(0, VR k V T ) 

Given these approximations and letting the predicted value of e k be zero, the Kalman 
filter equation used to estimate e k is 



By substituting equation (A.25) back into equation (A.24) and making use of 
equation (A.21) we see that we do not actually need the second (hypothetical) Kalman 
filter: 

*k = X k + K k e z k 

_ (A.26) 
= x k + K k (z k - z k ) 
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Equation equation (A.26) can now be used for the measurement update in the extended 
Kalman filter, with x k and z k coming from equation (A. 16) and equation (A. 17), and the 
Kalman gain K k coming from equation (A.l 1) with the appropriate substitution for the 
measurement error covariance. 

The complete set of EKF equations is shown below in table A3 and table A. 4. Note that 
we have substituted x k for x k to remain consistent with the earlier "super minus" a priori 
notation, and that we now attach the subscript k to the Jacobians A , W , //, and V, to 
reinforce the notion that they are different at (and therefore must be recomputed at) each 
time step. 

Table A3: EKF time update equations. 

*\ = KfrO) (A.27) 

As with the basic discrete Kalman filter, the time update equations in table A3 project the 
state and covariance estimates from the previous time step k - 1 to the current time step 
k . Again / in equation (A.27) comes from equation (A. 16), A k and W k are the process 
Jacobians at step k, and Q k is the process noise covariance equation (A3) at step k. 

Table A.4: EKF measurement update equations. 

x k = x k + K k (z k -h(x k ,0)) (A30) 

P k = (I-K k H k )P k (A.31) 

As with the basic discrete Kalman filter, the measurement update equations in table A.4 
correct the state and covariance estimates with the measurement z k . Again h in 
equation (A30) comes from equation (A. 17), H k and V are the measurement Jacobians at 
step fc, and R k is the measurement noise covariance equation (A.4) at step k. (Note we 
now subscript R allowing it to change with each measurement.) 

The basic operation of the EKF is the same as the linear discrete Kalman filter as shown in 
Figure A. 1. Figure A3 below offers a complete picture of the operation of the EKF, 
combining the high-level diagram of Figure A.l with the equations from table A3 and 
table A.4. 
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Time Update ("Predict") 



(1) Project the state ahead 

x k = f(x k _ v u k ,0) 

(2) Project the error covariance ahead 



Measurement Update ("Correct") 



(1) Compute the Kalman gain 

K k = P k Hl(H k P k Hl + V k R k VT)- 1 

(2) Update estimate with measurement 



X, = 



h + K k (z k -h(x k ,0)) 



(3) Update the error covariance 



Initial estimates for x k and/^ 



Figure A3: A complete picture of the operation of the extended 
Kalman filter, combining the high-level diagram of Figure A.l with 
the equations from table A. 3 and table A. 4. 

An important feature of the EKF is that the Jacobian H k in the equation for the Kalman 
gain K k serves to correctly propagate or "magnify" only the relevant component of the 
measurement information. For example, if there is not a one-to-one mapping between the 
measurement z k and the state via h , the Jacobian H k affects the Kalman gain so that it 
only magnifies the portion of the residual z k -h(x k , 0) that does affect the state. Of 
course if over all measurements there is not a one-to-one mapping between the 
measurement z k and the state via h , then as you might expect the filter will quickly 
diverge. In this case the process is unobservable. 

A3 An Example: Estimating a Random Constant 



In the previous two sections we presented the basic form for the discrete Kalman filter, and 
the extended Kalman filter. To help in developing a better feel for the operation and 
capability of the filter, we present a very simple example here. 
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A3.1 The Process Model 

In this simple example let us attempt to estimate a scalar random constant, a voltage for 
example. Let's assume that we have the ability to take measurements of the constant, but 
that the measurements are corrupted by a 0.1 volt RMS white measurement noise (e.g. our 
analog to digital converter is not very accurate). In this example, our process is governed 
by the linear difference equation 

x k = Ax k-\ +Bu k + w k 
= x k _ } +w k 

with a measurement z G St 1 that is 

z k = Hx k + v k 

= x k + v k 

The state does not change from step to step so A = 1 . There is no control input so 
u = 0 . Our noisy measurement is of the state directly so H = 1 . (Notice that we 
dropped the subscript k in several places because the respective parameters remain 
constant in our simple model.) 

A3.2 The Filter Equations and Parameters 

Our time update equations are 

x k = x k- 1 ' 

and our measurement update equations are 

Pi , (A.32) 

— * 

P k + R 

h = *k + K k^ z k - **) > 
p k = v- K k) p \- 

Presuming a very small process variance, we let Q = le-5 . (We could certainly let 
Q = 0 but assuming a small but non-zero value gives us more flexibility in "tuning" the 
filter as we will demonstrate below.) Let's assume that from experience we know that the 
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true value of the random constant has a standard normal probability distribution, so we 
will "seed" our filter with the guess that the constant is 0. In other words, before starting 
we let x k _ j = 0 . 

Similarly we need to choose an initial value for P k _\, call it P Q . If we were absolutely 
certain that our initial state estimate x 0 = 0 was correct, we would let Pq = 0 . However 
given the uncertainty in our initial estimate x 0 , choosing Pq = 0 would cause the filter to 
initially and always believe x k = 0 . As it turns out, the alternative choice is not critical. 
We could choose almost any P 0 * 0 and the filter would eventually converge. We'll start 
our filter with P Q = 1 . 

A33 The Simulations 

To begin with, we randomly chose a scalar constant z = -0.37727 (there is no "hat" on 
the z because it represents the "truth"). We then simulated 50 distinct measurements z k 
that had error normally distributed around zero with a standard deviation of 0.1 (remember 
we presumed that the measurements are corrupted by a 0.1 volt RMS white measurement 
noise). We could have generated the individual measurements within the filter loop, but 
pre-generating the set of 50 measurements allowed me to run several simulations with the 
same exact measurements (i.e. same measurement noise) so that comparisons between 
simulations with different parameters would be more meaningful. 

In the first simulation we fixed the measurement variance at R = (0.1 ) 2 = 0.01. Because 
this is the "true" measurement error variance, we would expect the "best" performance in 
terms of balancing responsiveness and estimate variance. This will become more evident 
in the second and third simulation. Figure A. 4 depicts the results of this first simulation. 
The true value of the random constant x = -037121 is given by the solid line, the noisy 
measurements by the cross marks, and the filter estimate by the remaining curve. 
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Figure A.4: The first simulation:/? = (0.1 ) 2 = 0.01 .The true val- 
ue of the random constant x = -0.37727 is given by the solid line, 
the noisy measurements by the cross marks, and the filter estimate 
by the remaining curve. 

When considering the choice for P Q above, we mentioned that the choice was not critical 
as long as P Q * 0 because the filter would eventually converge. Below in Figure A.5 we 
have plotted the value of P k versus the iteration. By the 50 th iteration, it has settled from 
the initial (rough) choice of 1 to approximately 0.0002 (Volts 2 ). 
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Figure A.5: After 50 iterations, our initial (rough) error covariance 
P k choice of 1 has settled to about 0.0002 (Volts 2 ). 

In Figure A. 6 and Figure A.7 below we can see what happens when R is increased or 
decreased by a factor of 100 respectively. In Figure A.6 the filter was told that the 
measurement variance was 100 times greater (i.e. R = 1 ) so it was "slower" to believe 
the measurements. 
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+ 




Figure A.6: Second simulation: R = 1 . The filter is slower to re- 
spond to the measurements, resulting in reduced estimate variance. 

In Figure A .7 the filter was told that the measurement variance was 100 times smaller (i.e. 
R = 0.0001 ) so it was very "quick" to believe the noisy measurements. 




Figure A.7: Third simulation: R = 0.0001 . The filter responds to 
measurements quickly, increasing the estimate variance. 

While the estimation of a constant is relatively straight-forward, it clearly demonstrates 
the workings of the Kalman filter. In Figure A.6 in particular the Kalman "filtering" is 
evident as the estimate appears considerably smoother than the noisy measurements. 
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